Home > Blockchain >  How do I filter out data based on a value in column while capturing minimum date criteria in another
How do I filter out data based on a value in column while capturing minimum date criteria in another

Time:11-29

I need to get a limited data based on the criteria of Ind being 'Y'. But it should only capture the first row when Ind changes from 'N' or 'O' value to 'Y'. In the Check_Date it should update that first value.

Input Data:

ID  Date    Ind 
2   201905  N    
2   201906  N    
2   201907  N    
2   201908  N    
2   201909  N    
2   201910  N    
2   201911  N    
2   201912  Y   
2   202001  Y    
2   202002  Y    
2   202003  Y    
2   202004  Y    
2   202005  N    
2   202006  N    
2   202007  N    
2   202008  Y   
3   201906  N    

Result:

ID  Date    Ind Check_Date
2   201912  Y   201912
2   202008  Y   202008

I didn't find a complete approach when I searched and was only able to filter out the data with Ind as Y. When I applied minimum condition to the date based on below code, it gave me limited data with first instance of ID that was Y on a particular Date. What am I doing wrong?

library(dplyr)

PO %>% 
  group(ID)
  filter(Date == min(Date)) %>%
  filter(Ind == 'Y') %>%
  slice(1) %>% # takes the first occurrence if there is a tie
  ungroup()

CodePudding user response:

Filter to rows where Ind == "Y" in the current but not previous row:

library(dplyr)

PO %>%
  group_by(ID) %>%
  filter(Ind == "Y" & lag(Ind, default = "") != "Y") %>%
  ungroup() %>%
  mutate(Check_Date = Date)
# A tibble: 2 × 4
     ID   Date Ind   Check_Date
  <dbl>  <dbl> <chr>      <dbl>
1     2 201912 Y         201912
2     2 202008 Y         202008

CodePudding user response:

Using data.table

library(data.table)
setDT(df1)[df1[, !duplicated(Ind) & Ind == 'Y', .(ID, rleid(Ind))]$V1][, 
     Check_Date := Date][]

-output

      ID   Date    Ind Check_Date
   <int>  <int> <char>      <int>
1:     2 201912      Y     201912
2:     2 202008      Y     202008
  • Related