Home > Software design >  Matching values of two data frames based on multiple conditions in R
Matching values of two data frames based on multiple conditions in R

Time:11-19

I have a two datasets

cycle <- c(160, 160, 150, 158, 180) 
split1 <- c(2, 2,4, 6, 8) 
split2 <- c(10,10, 12, 14, 16) 
df1 <- data.frame(cycle, split1, split2) 
df1
  cycle split1 split2
1   160      2     10
2   160      2     10
3   150      4     12
4   158      6     14
5   180      8     16

cycle <- c(160,150,190,180,161,150,140,179)
split1 <- c(2,4,12,8,2,4,32,8)
split2 <- c(10, 12, 18, 16, 10, 12, 21, 16)
df2 <- data.frame(cycle, split1, split2)
df2
  cycle split1 split2
1   160      2     10
2   150      4     12
3   190     12     18
4   180      8     16
5   161      2     10
6   150      4     12
7   140     32     21
8   179      8     16

I want to match the values of df1 and df2 and label the df2 values based on two conditions:

1- If the values of all three columns i.e cycle, split1, and split2 are exactly the same then assign a row with the label "Same" otherwise "Different".

2- If the difference of only cycle value from df1 and df2 is 1 or -1 and the rest of the row values are the same then assign a row with the label "Same" otherwise "Different".

The output should look like this

  cycle split1 split2      Type
1   160      2     10      Same
2   150      4     12      Same
3   190     12     18 Different
4   180      8     16      Same
5   161      2     10      Same
6   150      4     12      Same
7   140     32     21 Different
8   179      8     16      Same

I was successful in achieving the first condition as below

df1<- df1 %>% mutate(key = paste0(cycle,split1, split2, "_"))
df2<- df2 %>% mutate(key = paste0(cycle,split1, split2, "_"))
df2 %>% mutate(Type = ifelse(df2$key %in% df1$key, 'same', 'different'))%>%
  select(-key)

  cycle split1 split2      Type
1   160      2     10      same
2   150      4     12      same
3   190     12     18 different
4   180      8     16      same
5   161      2     10 different
6   150      4     12      same
7   140     32     21 different
8   179      8     16 different

but having a problem achieving the second one.

Any idea how to do this efficiently?

Thank you in advance.

CodePudding user response:

Based on your original df1 and df2 (without the generation of the new column key), you could use

df2 %>% 
  mutate(rn = row_number()) %>% 
  left_join(df1, by = c("split1", "split2"), suffix = c("", ".y")) %>% 
  mutate(
    type = coalesce(
      ifelse(abs(cycle - cycle.y) <= 1, "same", "different"), 
      "different")
    ) %>% 
  group_by(rn) %>% 
  distinct() %>% 
  ungroup() %>% 
  select(-rn, -cycle.y)

This returns

# A tibble: 8 x 4
  cycle split1 split2 type     
  <dbl>  <dbl>  <dbl> <chr>    
1   160      2     10 same     
2   150      4     12 same     
3   190     12     18 different
4   180      8     16 same     
5   161      2     10 same     
6   150      4     12 same     
7   140     32     21 different
8   179      8     16 same  
  • Related