So I have data wrangling question.
Here is my example data.
year <- c(2019, 2019, 2020, 2020, 2021, 2021, 2021)
sn <- c("AB1001", "AB1002", "AB1001", "DC1001", "AB1002", "AB1001", "AB1003")
hours <- c(150, 173, 189, 102, 175, 215, 98)
delta_hours <- c(150, 173, 39, 102, NA, NA, NA)
df <- data.frame(year, sn, hours, delta_hours)
which produces this data frame:
year sn hours delta_hours
1 2019 AB1001 150 150
2 2019 AB1002 173 173
3 2020 AB1001 189 39
4 2020 DC1001 102 102
5 2021 AB1002 175 NA
6 2021 AB1001 215 NA
7 2021 AB1003 98 NA
What I am needing to do is mutate/update the delta_hours column for 2021. Specifically, for sn's that are unique (haven't been in previous years) I want to be able to simply copy over the information from the hours column. For those sn's that have appeared in previous years, I want to subtract the hours from the 2021 year by the hours in whatever the most recent year was. So for sn AB1001
, I would want to take 215 - 189
in order to get 26
. For sn AB1003
I want to simply copy over the 98
. And of course for any year prior to 2021, I wan to simply keep the information.
My final data frame should basically look like this:
year sn hours delta_hours
1 2019 AB1001 150 150
2 2019 AB1002 173 173
3 2020 AB1001 189 39
4 2020 DC1001 102 102
5 2021 AB1002 175 2
6 2021 AB1001 215 26
7 2021 AB1003 98 98
I'm assuming I need to use case_when
, and I can do that to get the data from prior years to simply copy over, but I'm not getting how to do the subtraction properly.
Help appreciated! Thanks!
CodePudding user response:
library(dplyr)
df %>%
group_by(sn) %>%
mutate(
delta_hours = if_else(year == 2021, hours - lag(hours), delta_hours),
delta_hours = if_else(is.na(delta_hours), hours, delta_hours)
)
# A tibble: 7 x 4
# Groups: sn [4]
year sn hours delta_hours
<dbl> <chr> <dbl> <dbl>
1 2019 AB1001 150 150
2 2019 AB1002 173 173
3 2020 AB1001 189 39
4 2020 DC1001 102 102
5 2021 AB1002 175 2
6 2021 AB1001 215 26
7 2021 AB1003 98 98