Home > Enterprise >  I´m looking for a way to inverse the values of 2 columns of a row when this inverse exists in the da
I´m looking for a way to inverse the values of 2 columns of a row when this inverse exists in the da

Time:01-06

Here is my dataframe:

DF <- data.frame(
     VAR1 = c("A", "A", "B", "B", "B", "C", "C"),
     VAR2 = c("B", "C", "A", "D", "C", "B", "D"),
     VAR3 = c(1, 1, 1, 2, 4, 6, 4)
    )

I would like to have this:

VAR1 VAR2 VAR3  
A    B    2 
A    C    1
B    D    2
B    C    10  
C    D    4

If There is two rows like (VAR1=A, VAR2=B, VAR3=X) and (VAR2=B, VAR1=A, VAR3=Y), I want to have one row like this one (VAR1=A, VAR2=B, VAR3=X Y). So if the two first variables are "inverse", I would like to have one row with the sum of them.

I tried to have a column which says "Yes" if two rows have inverse values but I can´t find a way to do it. My code:

DF <- DF %>%
  mutate(VAR4 = case_when(VAR2 %in% DF$VAR1 & 
                          VAR1 %in% 
                            (DF %>% 
                                filter(VAR1 == VAR2) %>% 
                                pull(VAR2)
                             ) ~ "Yes",
                             TRUE ~ 'No' ))
    `

This is the result:

VAR1   VAR2   VAR3   VAR4  
A      B      1      No
A      C      1      No 
B      A      1      No 
B      D      2      No 
B      C      4      No 
C      B      6      No 
C      D      4      No

My code doesn´t work because my filter doesn´t take the result of VAR2 %in% DF$VAR1 in account.

Does someone have an idea?

CodePudding user response:

You could try:

library(dplyr)

DF %>%
  mutate(across(VAR1:VAR2, as.character)) %>%
  group_by(idx1 = pmin(VAR1, VAR2), idx2 = pmax(VAR1, VAR2)) %>%
  summarise(VAR3 = sum(VAR3)) %>%
  rename_with(~ sub('idx', 'VAR', .)) %>%
  ungroup

Output:

# A tibble: 5 x 3
  VAR1  VAR2   VAR3
  <chr> <chr> <dbl>
1 A     B         2
2 A     C         1
3 B     C        10
4 B     D         2
5 C     D         4

CodePudding user response:

You can sort first with apply, and then summarise:

DF[1:2] <- t(apply(DF[1:2], 1, sort))
DF %>% 
  group_by(VAR1, VAR2) %>% 
  summarise(VAR3 = sum(VAR3))

# A tibble: 5 × 3
# Groups:   VAR1 [3]
  VAR1  VAR2   VAR3
  <chr> <chr> <dbl>
1 A     B         2
2 A     C         1
3 B     C        10
4 B     D         2
5 C     D         4

Or, in single pipe:

DF %>% 
  mutate(VAR = pmap(., ~ sort(c(..1, ..2)) %>% 
                      set_names(., c("VAR1", "VAR2")))) %>% 
  group_by(VAR) %>% 
  summarise(VAR3 = sum(VAR3)) %>% 
  unnest_wider(VAR)
  • Related