I'm not sure how to word the title better - I have a list of names, dates, and services. I want to find all instances of a specific service occurring only when there were 0 other services BEFORE the date of the specific one.
Example data below. The desired output would be ONLY returning row 5 because Bruce Wayne had a surgery with 0 services beforehand. John Doe is disqualified because there was a check-up beforehand and Jane Doe is disqualified because there was no surgery.
Extra question - Instead of checking for any occurrence beforehand, how would I check for any occurrence within 6 months instead?
Date <- c("2022-01-01","2022-04-01","2022-05-01","2022-07-01","2022-08-01","2022-08-05")
Name <- c("John Doe","John Doe","John Doe","Jane Doe","Bruce Wayne","Bruce Wayne")
Service <- c("Check-up","Surgery","Follow-up", "Check-up", "Surgery", "Follow-up")
df <- data.frame(Date,Name,Service)
df
Date Name Service
1 2022-01-01 John Doe Check-up
2 2022-04-01 John Doe Surgery
3 2022-05-01 John Doe Follow-up
4 2022-07-01 Jane Doe Check-up
5 2022-08-01 Bruce Wayne Surgery
6 2022-08-05 Bruce Wayne Follow-up
CodePudding user response:
I don't always trust the ordering of the frame,
df %>%
group_by(Name) %>%
filter(Service == "Surgery", Date == min(Date)) %>%
ungroup()
# # A tibble: 1 × 3
# Date Name Service
# <chr> <chr> <chr>
# 1 2022-08-01 Bruce Wayne Surgery
CodePudding user response:
You could filter on surgery and check if it is the first row_number
library(dplyr)
df %>%
group_by(Name) %>%
filter(Service == "Surgery" & row_number() == 1)
#> # A tibble: 1 × 3
#> # Groups: Name [1]
#> Date Name Service
#> <chr> <chr> <chr>
#> 1 2022-08-01 Bruce Wayne Surgery
Created on 2023-01-27 with reprex v2.0.2