I have a quite large dataset where I need to aggregate some of the rows based on several conditions. But first I have to change the value of one of the variables; the date variable.
Below is an example:
df <- data.frame(
Date=c("2021-01-07", "2021-01-10", "2021-01-07", "2021-01-06", "2021-01-06"),
Specie=c("cod", "cod", "cod", "cod", "haddock"),
Size=c("small", "small", "medium", "small", "medium"),
category=c("A", "B", "A", "A", "A"),
Value=c(500, 50, 600, 750, 700)
)
> df
Date Specie Size Value category
1 2021-01-07 cod small 500 A
2 2021-01-10 cod small 50 B
3 2021-01-07 cod medium 600 A
4 2021-01-06 cod small 750 A
5 2021-01-06 haddock medium 700 A
I need to change the value of the "Date" variable, when "category"==B, to the same date as in "category"==A, when values in "Specie" and "Size" are equal in the two rows. In the example data above I want to change the date in line 2 to make it the same as the date in line 1, like this:
> df
Date Specie Size Value category
1 2021-01-07 cod small 500 A
2 2021-01-07 cod small 50 B
3 2021-01-07 cod medium 600 A
4 2021-01-06 cod small 750 A
5 2021-01-06 haddock medium 700 A
Now, in line 1 and 2, only "Value" and "category" differs.
I have no idea how to approach this, so I will be very grateful for suggestions!
CodePudding user response:
Using dplyr
you can do
library(dplyr)
df %>%
group_by(Specie, Size) %>%
mutate(Date = if_else(category=="B", first(Date[category=="A"]), Date))
This goes the grouping, and then uses an ifelse
to change the category B dates to the first date in category A (first just helps incase there are multiple category A dates in the group)