In R, Find the first incidence (row) of a value that is x amount greater/less than the current value-CodePudding

Good day,

I've been trying my best, but not quite getting there. I'm trying to iterate through the value in a vector (df$sample) and find the first proceeding incidence of a value that is 20% less than the current value. I am trying to find this for each row (sample) and print the date of the found value to a new column.

Here's my df:

    date       sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
...

My shotty attempts have been to use Position() or which(). I thought maybe I could wrap either of them in a for loop, but my attempts are not quite right.

for(i in length(df){

df$conc20 <- Position(function(x) x < df$sample[i]*0.80, df$sample)
}

for(i in length(df){

df$conc20 <- min(which(df$sample < df$sample[i]*0.8)

}

I even found a dply example that got close to what I was looking for.

Ideally:

    date       sample   conc20
591 2020-02-14 0.008470 2020-02-25
590 2020-02-15 0.008460 ...
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
...

Any clarification I'm happy to provide. I really do appreciate the help!

CodePudding user response：

I am assuming that when we say 'first' , we are counting backwards.

df<- read.csv( sep = " ",  text=
"row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279", 

)
df


#there is no row 20% below, so I am just using 2% below 
# and multiplying 0.98 instead of 0.8

f_crossover<- function(  i  ){
  cutoff= df$sample[i]
  max(which( df$sample[1:i]<= cutoff*0.98), -1)
}

#this new column  gives the line which has crossed our cutoff
#No need to add it as a coulmn, bt this makes comparison easy
#if there is no such row, we get -1
df$crossover= sapply( seq_along(df$sample) ,  FUN = f_crossover )

#Some more columns to test for row #10 and 11
df$tester10= df$sample/df$sample[10]
df$tester11= df$sample/df$sample[11]

df

## row       date   sample crossover  tester10  tester11
##1  591 2020-02-14 0.008470        -1 1.2115577 1.1550525
##2  590 2020-02-15 0.008460        -1 1.2101273 1.1536888
##3  589 2020-02-16 0.007681        -1 1.0986983 1.0474567
##4  588 2020-02-17 0.007144        -1 1.0218853 0.9742261
##5  587 2020-02-18 0.007262        -1 1.0387641 0.9903177
##6  586 2020-02-19 0.007300         4 1.0441997 0.9954998
##7  585 2020-02-20 0.006604        -1 0.9446431 0.9005864
##8  584 2020-02-21 0.006843         7 0.9788299 0.9331788
##9  583 2020-02-22 0.006687        -1 0.9565155 0.9119051
##10 582 2020-02-23 0.006991         9 1.0000000 0.9533615
##11 581 2020-02-24 0.007333        10 1.0489200 1.0000000
##12 580 2020-02-25 0.006738        -1 0.9638106 0.9188599
##13 579 2020-02-26 0.006279        -1 0.8981548 0.8562662

#some visual proof 
plot(df$sample, type='h')
abline(h=df$sample[11])
abline(h=df$sample[11]*0.98, col='red')

# which( df$sample[1:10]<= df$sample[10]*0.98)

CodePudding user response：

Quite messy, but this should do the trick

library(dplyr)
df<- read.csv( sep = " ",  text=
                 "row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279", 
               
)

x <- 1.05

df %>%
  mutate(id =  1:n()) %>% 
  rowwise %>% 
  mutate(greater_row = 
           first(which(sample*x <
                         df$sample[id:nrow(df)])   
                   id-1))
df$greater_row <- df$date[df$greater_row]

This should allow you to set x to any factor you want you want