I need to get a limited data based on the criteria of Ind being 'Y'. But it should only capture the first row when Ind changes from 'N' or 'O' value to 'Y'. In the Check_Date it should update that first value.
Input Data:
ID Date Ind
2 201905 N
2 201906 N
2 201907 N
2 201908 N
2 201909 N
2 201910 N
2 201911 N
2 201912 Y
2 202001 Y
2 202002 Y
2 202003 Y
2 202004 Y
2 202005 N
2 202006 N
2 202007 N
2 202008 Y
3 201906 N
Result:
ID Date Ind Check_Date
2 201912 Y 201912
2 202008 Y 202008
I didn't find a complete approach when I searched and was only able to filter out the data with Ind as Y. When I applied minimum condition to the date based on below code, it gave me limited data with first instance of ID that was Y on a particular Date. What am I doing wrong?
library(dplyr)
PO %>%
group(ID)
filter(Date == min(Date)) %>%
filter(Ind == 'Y') %>%
slice(1) %>% # takes the first occurrence if there is a tie
ungroup()
CodePudding user response:
Filter to rows where Ind == "Y"
in the current but not previous row:
library(dplyr)
PO %>%
group_by(ID) %>%
filter(Ind == "Y" & lag(Ind, default = "") != "Y") %>%
ungroup() %>%
mutate(Check_Date = Date)
# A tibble: 2 × 4
ID Date Ind Check_Date
<dbl> <dbl> <chr> <dbl>
1 2 201912 Y 201912
2 2 202008 Y 202008
CodePudding user response:
Using data.table
library(data.table)
setDT(df1)[df1[, !duplicated(Ind) & Ind == 'Y', .(ID, rleid(Ind))]$V1][,
Check_Date := Date][]
-output
ID Date Ind Check_Date
<int> <int> <char> <int>
1: 2 201912 Y 201912
2: 2 202008 Y 202008