I am trying to write some code to define some boundary layers.
The data is arranged in the table like this (but there are many more rows):
Depth (um) | Replicate 1 (O2 Sat %) | Replicate 2 (O2 Sat %) | Replicate 3 (O2 Sat %) |
---|---|---|---|
0 | 10 | 11 | 11 |
-100 | 11 | 11 | 12 |
-200 | 13 | 12 | 11 |
-300 | 14 | 13 | 14 |
-400 | 15 | 15 | 15 |
-500 | 16 | 16 | 16 |
For each of these replicates I want to find the size of the boundary layer. I am defining the boundary layer as the distance above the surface (0um) at which the changes are <5% per 100um for for subsequent measurements. So I need to find the depth of the first row that results in this definition not being met. I also need it to make sure that the function is using rows 1-4 then 2-5 then 3-6 and so on not just moving down the rows 1-4,5-8 etc etc so that I can identify the first time this boundary layer definition is not met. I would like to detect this change for each replicate.
I have tried looking for some ways to approach it but I am not sure I am asking the correct questions because I am not exactly sure which type of functions to start with. I am assuming this may be some type of threshold or cutoff type function but I thought I would get some ideas on how to proceed as my searching was not getting me anywhere.
I appreciate any advice or ideas on how to get started on this. Thank you for your time in advance.
CodePudding user response:
I think something like this can be accomplished with diff
and a moving sum (via stats::filter
):
Make some example data:
dat <- read.table(text="
Depth Replicate1 Replicate2 Replicate3
0 10 11 11
-100 11 11 12
-200 13 12 11
-300 14 13 14
-400 15 15 15
-500 16 16 16", header=TRUE)
Set some variables:
cutoff <- 0.05
windowsize <- 4
Calculate the percentage difference row to row:
percdiff <- diff(as.matrix(dat[-1])) / dat[-nrow(dat), -1]
percdiff
# Replicate1 Replicate2 Replicate3
#1 0.10000000 0.00000000 0.09090909
#2 0.18181818 0.09090909 -0.08333333
#3 0.07692308 0.08333333 0.27272727
#4 0.07142857 0.15384615 0.07142857
#5 0.06666667 0.06666667 0.06666667
Check the percentage difference is above 5% at each row:
percdiff_co <- percdiff > cutoff
percdiff_co
# Replicate1 Replicate2 Replicate3
#1 TRUE FALSE TRUE
#2 TRUE TRUE FALSE
#3 TRUE TRUE TRUE
#4 TRUE TRUE TRUE
#5 TRUE TRUE TRUE
Calculate a moving sum for each replicate in the 4-observation window:
out <- stats::filter(percdiff_co, rep(1,windowsize), sides=1)
out
#Time Series:
#Start = 1
#End = 5
#Frequency = 1
# [,1] [,2] [,3]
#1 NA NA NA
#2 NA NA NA
#3 NA NA NA
#4 4 3 3
#5 4 4 3
Check if the moving sum of greater than the cutoff is always TRUE
:
out <- out == windowsize
tail(out, -(windowsize-1))
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] TRUE TRUE FALSE
CodePudding user response:
Here is a potential solution using dplyr functions, but I don't understand what you mean by
I also need it to make sure that the function is using rows 1-4 then 2-5 then 3-6 and so on not just moving down the rows 1-4,5-8 etc etc so that I can identify the first time this boundary layer definition is not met.
so, this may not properly answer your question:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(
check.names = FALSE,
`Depth (um)` = c(0L, -100L, -200L, -300L, -400L, -500L),
`Replicate 1 (O2 Sat %)` = c(10L, 11L, 13L, 14L, 15L, 16L),
`Replicate 2 (O2 Sat %)` = c(11L, 11L, 12L, 13L, 15L, 16L),
`Replicate 3 (O2 Sat %)` = c(11L, 12L, 11L, 14L, 15L, 16L)
)
# To 'replace' the columns:
df %>%
mutate(across(starts_with("Replicate"),
~as.logical(cummax(abs(.x / lag(.x, default = first(.x)) - 1) > 0.05))))
#> Depth (um) Replicate 1 (O2 Sat %) Replicate 2 (O2 Sat %)
#> 1 0 FALSE FALSE
#> 2 -100 TRUE FALSE
#> 3 -200 TRUE TRUE
#> 4 -300 TRUE TRUE
#> 5 -400 TRUE TRUE
#> 6 -500 TRUE TRUE
#> Replicate 3 (O2 Sat %)
#> 1 FALSE
#> 2 TRUE
#> 3 TRUE
#> 4 TRUE
#> 5 TRUE
#> 6 TRUE
# To add the columns to the existing dataframe:
df %>%
mutate(across(starts_with("Replicate"),
~as.logical(cummax(abs(.x / lag(.x, default = first(.x)) - 1) > 0.05)),
.names = "{paste0(.col, ' <5% change')}")) %>%
select(1, 2, 5, 3, 6, 4, 7)
#> Depth (um) Replicate 1 (O2 Sat %) Replicate 1 (O2 Sat %) <5% change
#> 1 0 10 FALSE
#> 2 -100 11 TRUE
#> 3 -200 13 TRUE
#> 4 -300 14 TRUE
#> 5 -400 15 TRUE
#> 6 -500 16 TRUE
#> Replicate 2 (O2 Sat %) Replicate 2 (O2 Sat %) <5% change
#> 1 11 FALSE
#> 2 11 FALSE
#> 3 12 TRUE
#> 4 13 TRUE
#> 5 15 TRUE
#> 6 16 TRUE
#> Replicate 3 (O2 Sat %) Replicate 3 (O2 Sat %) <5% change
#> 1 11 FALSE
#> 2 12 TRUE
#> 3 11 TRUE
#> 4 14 TRUE
#> 5 15 TRUE
#> 6 16 TRUE
Created on 2022-07-25 by the reprex package (v2.0.1)