Home > Software design >  How do I replace the value of an observation based on another variable's values, within a group
How do I replace the value of an observation based on another variable's values, within a group

Time:10-26

I'm trying to find a way to replace values based on the value of another observation within a group. The replacement value is linked to the value observed through another variable. In particular, I'm looking for a way to transform the dataframe below, to ensure that the variable "Flow" does not include the values 0, but the negative of the inversed variable "Border"). Let's assume the following dataframe:

df <- data.frame("Hour" = c(rep(1, 6), rep(2, 6)),
                 "Border" = rep(c("BE - NL", "NL - BE", "BE - FR", "FR - BE", "BE - DE", "DE - BE"), 2),
                 "Flow" = c(0, 100, 200, 0, 150, 0, 250, 0, 50, 0, 300, 0))

What I want is that the value in row 1 (observation for Hour = 1, Border = "BE - NL") equals the negative value in row 2 (observation for Hour = 1, Border = "NL - BE"). The observation in row 4 (Hour = 1, Border = "FR - BE") should be the same as in row 3 (Hour = 1, Border = "BE - FR"). Same thing for the observations in other hours: whenever a value ("Flow") is equal to zero, it should take on the negative of the Flow value of the observation in the same hour, but where the Border is inversed.

Any suggestions on how to do this? If this would work within dplyr, that would be great, but other suggestions are more then welcome!

My thinking was to use the group_by %>% mutate logic, combined with a case_when(), to modify only the Flow = 0 values, but can't figure out how to reference / subset the Border variable:

df %>% 
  group_by(Hour) %>% 
  mutate(Flow = case_when(Flow != 0 ~ Flow,
                          Flow == 0 ~ -Flow[#not sure how to reference the subset])

CodePudding user response:

Here is one option where we split the column 'Border' into two, rearrange the column values and use that in grouping to replace the 0 values with the corresponding non-zero negative Flow value

library(dplyr)
library(tidyr)
df %>% 
  separate(Border, into = c("b1", 'b2'), sep = "\\s -\\s ", 
    remove = FALSE) %>% 
  group_by(Hour, b1new = pmin(b1, b2), b2new = pmax(b1, b2)) %>% 
  mutate(Flow = replace(Flow, Flow == 0, -Flow[Flow != 0])) %>% 
  ungroup %>% 
  select(names(df))

-output

# A tibble: 12 × 3
    Hour Border   Flow
   <dbl> <chr>   <dbl>
 1     1 BE - NL  -100
 2     1 NL - BE   100
 3     1 BE - FR   200
 4     1 FR - BE  -200
 5     1 BE - DE   150
 6     1 DE - BE  -150
 7     2 BE - NL   250
 8     2 NL - BE  -250
 9     2 BE - FR    50
10     2 FR - BE   -50
11     2 BE - DE   300
12     2 DE - BE  -300
  • Related