I have a two datasets
cycle <- c(160, 160, 150, 158, 180)
split1 <- c(2, 2,4, 6, 8)
split2 <- c(10,10, 12, 14, 16)
df1 <- data.frame(cycle, split1, split2)
df1
cycle split1 split2
1 160 2 10
2 160 2 10
3 150 4 12
4 158 6 14
5 180 8 16
cycle <- c(160,150,190,180,161,150,140,179)
split1 <- c(2,4,12,8,2,4,32,8)
split2 <- c(10, 12, 18, 16, 10, 12, 21, 16)
df2 <- data.frame(cycle, split1, split2)
df2
cycle split1 split2
1 160 2 10
2 150 4 12
3 190 12 18
4 180 8 16
5 161 2 10
6 150 4 12
7 140 32 21
8 179 8 16
I want to match the values of df1 and df2 and label the df2 values based on two conditions:
1- If the values of all three columns i.e cycle, split1, and split2 are exactly the same then assign a row with the label "Same" otherwise "Different".
2- If the difference of only cycle value from df1 and df2 is 1 or -1 and the rest of the row values are the same then assign a row with the label "Same" otherwise "Different".
The output should look like this
cycle split1 split2 Type
1 160 2 10 Same
2 150 4 12 Same
3 190 12 18 Different
4 180 8 16 Same
5 161 2 10 Same
6 150 4 12 Same
7 140 32 21 Different
8 179 8 16 Same
I was successful in achieving the first condition as below
df1<- df1 %>% mutate(key = paste0(cycle,split1, split2, "_"))
df2<- df2 %>% mutate(key = paste0(cycle,split1, split2, "_"))
df2 %>% mutate(Type = ifelse(df2$key %in% df1$key, 'same', 'different'))%>%
select(-key)
cycle split1 split2 Type
1 160 2 10 same
2 150 4 12 same
3 190 12 18 different
4 180 8 16 same
5 161 2 10 different
6 150 4 12 same
7 140 32 21 different
8 179 8 16 different
but having a problem achieving the second one.
Any idea how to do this efficiently?
Thank you in advance.
CodePudding user response:
Based on your original df1
and df2
(without the generation of the new column key
), you could use
df2 %>%
mutate(rn = row_number()) %>%
left_join(df1, by = c("split1", "split2"), suffix = c("", ".y")) %>%
mutate(
type = coalesce(
ifelse(abs(cycle - cycle.y) <= 1, "same", "different"),
"different")
) %>%
group_by(rn) %>%
distinct() %>%
ungroup() %>%
select(-rn, -cycle.y)
This returns
# A tibble: 8 x 4
cycle split1 split2 type
<dbl> <dbl> <dbl> <chr>
1 160 2 10 same
2 150 4 12 same
3 190 12 18 different
4 180 8 16 same
5 161 2 10 same
6 150 4 12 same
7 140 32 21 different
8 179 8 16 same