Home > Mobile >  Compare 3 columns in two different data frames and action based on comparison result
Compare 3 columns in two different data frames and action based on comparison result

Time:02-02

I am trying to compare 3 columns in two different data frames, and updating a 'Flag' based on the comparison results.

For example,

df_1 <- data.frame(x=seq(1,5,1),y=seq(1.5,5.5,1),z=LETTERS[1:5],Flag=0)

enter image description here

df_2 <- data.frame(l=c(1.2,10,3.2,5,5.2),k=c("A","F","P","D","E"))

enter image description here

if the row-wise element of df_2 (col=l) is in between row-wise element of df_1 (cols =x, y) AND row-wise element in df_1 (col=z) matches with the row-wise element of df_2(col = k), then flag is updated to 1, else 0. So the expected output is

enter image description here

My code thus far :

df_1 %>% mutate(Flag=map_chr(
  .x=c(as.character(df_1$z),as.numeric(df_1$x),as.numeric(df_1$y)),
  .f= ~if_else(condition = all(.x == as.character(df_2$k) & x >= as.numeric(df_2$l)& x<=as.numeric(df_2$k)),true="1",false="0")
))

Error : Error in mutate(): ! Problem while computing Flag = map_chr(...). ✖ Flag must be size 5 or 1, not 15. Run rlang::last_error() to see where the error occurred. There were 15 warnings (use warnings() to see them)

warnings()

Warning messages: 1: Problem while computing Flag = map_chr(...). ℹ NAs introduced by coercion 2: Problem while computing Flag = map_chr(...). ℹ NAs introduced by coercion 3: Problem while computing Flag = map_chr(...). ℹ NAs introduced by coercion ... 15 times As the dataframes are quite large, I am trying to avoid loops.

Many thanks @zephryl for answering my original question (above)

A variation of this problem is : one row in df_1 to be compared with all the rows in df_2 and flag "incremented" accordingly.

For example, df_1

enter image description here

df_2

enter image description here

Expected output

enter image description here

Thanks in advance!

CodePudding user response:

No need to use map() or loops, you can just directly test your conditions thanks to vectorized operations:

df_1$Flag <- ifelse(df_2$l >= df_1$x & df_2$l <= df_1$y & df_1$z == df_2$k, 1, 0)

or,

df_1$Flag <- with(cbind(df_1, df_2), ifelse(l >= x & l <= y & z == k, 1, 0))

or,

library(dplyr)

df_1 <- df_1 %>% 
  mutate(Flag = if_else(df_2$l >= x & df_2$l <= y & z == df_2$k, 1, 0))

Result from all 3 approaches:

#> df_1
  x   y z Flag
1 1 1.5 A    1
2 2 2.5 B    0
3 3 3.5 C    0
4 4 4.5 D    0
5 5 5.5 E    1
  • Related