Home > Enterprise >  How to change value of variable, based on several conditions - R?
How to change value of variable, based on several conditions - R?

Time:06-28

I have a quite large dataset where I need to aggregate some of the rows based on several conditions. But first I have to change the value of one of the variables; the date variable.

Below is an example:

df <- data.frame(
  Date=c("2021-01-07", "2021-01-10", "2021-01-07", "2021-01-06", "2021-01-06"),
  Specie=c("cod", "cod", "cod", "cod", "haddock"),
  Size=c("small", "small", "medium", "small", "medium"),
  category=c("A", "B", "A", "A", "A"),
  Value=c(500, 50, 600, 750, 700)
)
> df
        Date  Specie   Size Value category
1 2021-01-07     cod  small   500        A
2 2021-01-10     cod  small    50        B
3 2021-01-07     cod medium   600        A
4 2021-01-06     cod  small   750        A
5 2021-01-06 haddock medium   700        A

I need to change the value of the "Date" variable, when "category"==B, to the same date as in "category"==A, when values in "Specie" and "Size" are equal in the two rows. In the example data above I want to change the date in line 2 to make it the same as the date in line 1, like this:

> df
        Date  Specie   Size Value category
1 2021-01-07     cod  small   500        A
2 2021-01-07     cod  small    50        B
3 2021-01-07     cod medium   600        A
4 2021-01-06     cod  small   750        A
5 2021-01-06 haddock medium   700        A

Now, in line 1 and 2, only "Value" and "category" differs.

I have no idea how to approach this, so I will be very grateful for suggestions!

CodePudding user response:

Using dplyr you can do

library(dplyr)
df %>% 
  group_by(Specie, Size) %>% 
  mutate(Date = if_else(category=="B", first(Date[category=="A"]), Date))

This goes the grouping, and then uses an ifelse to change the category B dates to the first date in category A (first just helps incase there are multiple category A dates in the group)

  • Related