Home > database >  How to evaluate conditions met on previous values within a group in a data.frame in R
How to evaluate conditions met on previous values within a group in a data.frame in R

Time:03-04

Thanks in advance for your help on this question you lovely people of SQ , I have a dataset that looks like this:

library(tidyverse)   
 name<-c("AAA","AAA","AAA")
    value<-c(1:3)
    order<-c(2,3,6)
    tag<-c(0,0,0)
    part_a<-data.frame(name,value,order,tag)
    name<-c("AAA","AAA","AAA")
    value<-c(1:3)
    order<-c(1,5,4)
    key<-c(1,1,1)
    part_b<-data.frame(name,value,order,key)
    df<-bind_rows(part_a,part_b)
    df %>% group_by(name) %>% arrange(order)


    Groups:   name [1]
      name  value order   tag   key
      <chr> <int> <dbl> <dbl> <dbl>
    1 AAA       1     1    NA     1
    2 AAA       1     2     0    NA
    3 AAA       2     3     0    NA
    4 AAA       3     4    NA     1
    5 AAA       2     5    NA     1
    6 AAA       3     6     0    NA

I am trying to write a code that will allow me to create two new variables named CHECK_1 and CHECK_2 , CHECK_1 needs to have two values it will be 1 if the previous value of the column tag is NA and the previous value of the column key is not NA and 0 otherwise, the variable CHECK_2 needs to have only two value it should be "out of range" if the difference between the current and previous value of the column order is greater than 3 and "within range" otherwise. This operations must be done within groups in a data frame, I know I can make use of the base R function lag() but I have tried to incorporate the criteria in a if_else statement with no luck, thank you so much your help guys! from the bottom of my heart

my desired output will be:

name value order tag key check_1 check_2
AAA 1 1 NA 1 1 "within range"
AAA 1 2 0 NA 0 "within range"
AAA 2 3 0 NA 0 "within range"
AAA 3 4 NA 1 1 "within range"
AAA 2 5 NA 1 1 "within range"
AAA 3 6 0 NA 0 "within range"

CodePudding user response:

You could do something like this with tidyverse. However, I am unsure what you want to do with the first row of values for the group.

library(tidyverse)

df %>% 
  group_by(name) %>% 
  arrange(order) %>% 
  mutate(check_1 = ifelse(is.na(lag(tag)) & !is.na(lag(key)), 1, 0),
         check_2 = ifelse(order - lag(order) > 3, "out of range", "within range"))

Output

  name  value order   tag   key check_1 check_2     
  <chr> <int> <dbl> <dbl> <dbl>   <dbl> <chr>       
1 AAA       1     1    NA     1       0 NA          
2 AAA       1     2     0    NA       1 within range
3 AAA       2     3     0    NA       0 within range
4 AAA       3     4    NA     1       0 within range
5 AAA       2     5    NA     1       1 within range
6 AAA       3     6     0    NA       1 within range

If you are wanting to default the first row values to 1 and "within range", then we could use case_when to make an extra ifelse condition.

df %>% 
  group_by(name) %>% 
  arrange(order) %>% 
  mutate(check_1 = case_when(row_number() == 1 ~ 1,
                             is.na(lag(tag)) & !is.na(lag(key)) ~ 1, 
                             TRUE ~ 0),
         check_2 = case_when(row_number() == 1 ~ "within range",
                             order - lag(order) > 3 ~ "out of range", 
                             TRUE ~ "within range"))

  name  value order   tag   key check_1 check_2     
  <chr> <int> <dbl> <dbl> <dbl>   <dbl> <chr>       
1 AAA       1     1    NA     1       1 within range
2 AAA       1     2     0    NA       1 within range
3 AAA       2     3     0    NA       0 within range
4 AAA       3     4    NA     1       0 within range
5 AAA       2     5    NA     1       1 within range
6 AAA       3     6     0    NA       1 within range
  • Related