I have data structured like below, but with many more columns.
I need to filter the data to include only instances where a person has a date of X and X 1.
In this example only person B and C should remain, and only the rows with directly adjacent dates. So rows 2,3,5,6 should be the only remaining ones.
Once it is filtered I need to count how many times this occurred as well as do calculations on the other values, likely summing up the Values column for the X 1 date.
Person <- c("A","B","B","B","C","C","D","D")
Date <- c("2021-01-01","2021-01-01","2021-01-02","2021-01-04","2021-01-09","2021-01-10","2021-01-26","2021-01-29")
Values <- c(10,15,6,48,71,3,1,3)
df <- data.frame(Person, Date, Values)
df
How would I accomplish this?
CodePudding user response:
end_points <- df %>%
mutate(Date = as.Date(Date)) %>%
group_by(Person) %>%
filter(Date - lag(Date) == 1 | lead(Date) - Date == 1) %>%
ungroup()
Result
end_points
# A tibble: 4 x 3
Person Date Values
<chr> <date> <dbl>
1 B 2021-01-01 15
2 B 2021-01-02 6
3 C 2021-01-09 71
4 C 2021-01-10 3
2nd part:
end_points %>%
group_by(Person) %>%
slice_max(Date) %>%
ungroup() %>%
summarize(total = sum(Values))