Good day,
I've been trying my best, but not quite getting there. I'm trying to iterate through the value in a vector (df$sample) and find the first proceeding incidence of a value that is 20% less than the current value. I am trying to find this for each row (sample) and print the date of the found value to a new column.
Here's my df:
date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
...
My shotty attempts have been to use Position() or which(). I thought maybe I could wrap either of them in a for loop, but my attempts are not quite right.
for(i in length(df){
df$conc20 <- Position(function(x) x < df$sample[i]*0.80, df$sample)
}
or
for(i in length(df){
df$conc20 <- min(which(df$sample < df$sample[i]*0.8)
}
I even found a dply example that got close to what I was looking for.
Ideally:
date sample conc20
591 2020-02-14 0.008470 2020-02-25
590 2020-02-15 0.008460 ...
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
...
Any clarification I'm happy to provide. I really do appreciate the help!
CodePudding user response:
I am assuming that when we say 'first' , we are counting backwards.
df<- read.csv( sep = " ", text=
"row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",
)
df
#there is no row 20% below, so I am just using 2% below
# and multiplying 0.98 instead of 0.8
f_crossover<- function( i ){
cutoff= df$sample[i]
max(which( df$sample[1:i]<= cutoff*0.98), -1)
}
#this new column gives the line which has crossed our cutoff
#No need to add it as a coulmn, bt this makes comparison easy
#if there is no such row, we get -1
df$crossover= sapply( seq_along(df$sample) , FUN = f_crossover )
#Some more columns to test for row #10 and 11
df$tester10= df$sample/df$sample[10]
df$tester11= df$sample/df$sample[11]
df
## row date sample crossover tester10 tester11
##1 591 2020-02-14 0.008470 -1 1.2115577 1.1550525
##2 590 2020-02-15 0.008460 -1 1.2101273 1.1536888
##3 589 2020-02-16 0.007681 -1 1.0986983 1.0474567
##4 588 2020-02-17 0.007144 -1 1.0218853 0.9742261
##5 587 2020-02-18 0.007262 -1 1.0387641 0.9903177
##6 586 2020-02-19 0.007300 4 1.0441997 0.9954998
##7 585 2020-02-20 0.006604 -1 0.9446431 0.9005864
##8 584 2020-02-21 0.006843 7 0.9788299 0.9331788
##9 583 2020-02-22 0.006687 -1 0.9565155 0.9119051
##10 582 2020-02-23 0.006991 9 1.0000000 0.9533615
##11 581 2020-02-24 0.007333 10 1.0489200 1.0000000
##12 580 2020-02-25 0.006738 -1 0.9638106 0.9188599
##13 579 2020-02-26 0.006279 -1 0.8981548 0.8562662
#some visual proof
plot(df$sample, type='h')
abline(h=df$sample[11])
abline(h=df$sample[11]*0.98, col='red')
# which( df$sample[1:10]<= df$sample[10]*0.98)
CodePudding user response:
Quite messy, but this should do the trick
library(dplyr)
df<- read.csv( sep = " ", text=
"row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",
)
x <- 1.05
df %>%
mutate(id = 1:n()) %>%
rowwise %>%
mutate(greater_row =
first(which(sample*x <
df$sample[id:nrow(df)])
id-1))
df$greater_row <- df$date[df$greater_row]
This should allow you to set x
to any factor you want you want