Home > Software design >  Merge conditional dataframe rows repeating values
Merge conditional dataframe rows repeating values

Time:11-10

I have one dataframe (X) that looks like

label1  label2
1        2
1        3
1        4
2        1
2        3
2        4

And another (Y) that looks like

label  COLONY
1       5
2       5     
3       5
4       6

I have compared the labels in dataframe X to Y if they are in the same colony listed in dataframe Y and made a new dataframe with this line:

Z <- transform(X, SAME.COLONY = Y$COLONY[match(X$label1, Y$label)] == Y$COLONY[match(X$label2, Y$label)])
label1    label2  SAME.COLONY
1           2        TRUE
1           3        TRUE
1           4        FALSE
2           1        TRUE
2           3        TRUE
2           4        FALSE
3           4        FALSE

Now I am looking to import a new column from dataframe Y into Z with the group value only if Z$SAME.COLONY==T and the one of the label numbers match, but this isn't working for me:

Z$COLONY<- ifelse(Z$SAME.COLONY == T && Z$label1 == Y$label, Y$COLONY, NA)

I get this warning message:

Warning message:
In Z$label1 == Y$label :
  longer object length is not a multiple of shorter object length

This may be becuase label values are repeated multiple time in Z$label1 and Z$label2 but I'm not sure how to account for this?

Reproduce data:

X=data.frame(label1=c(1,1,1,2,2,2,3),  label2=c(2,3,4,1,3,4,4))
Y=data.frame(label=c(1,2,3,4),  COLONY=c(5,5,5,6))

Z <- transform(X, 
  SAME.COLONY = 
    Y$COLONY[match(X$label1, Y$label)] == 
      Y$COLONY[match(X$label2, Y$label)]
)

CodePudding user response:

We may do a join

library(dplyr)
Z <- left_join(Z, Y, by = c("label1" = "label")) %>%
    mutate(COLONY = case_when(SAME.COLONY~ COLONY))

-output

Z
  label1 label2 SAME.COLONY COLONY
1      1      2        TRUE      5
2      1      3        TRUE      5
3      1      4       FALSE     NA
4      2      1        TRUE      5
5      2      3        TRUE      5
6      2      4       FALSE     NA
7      3      4       FALSE     NA

CodePudding user response:

If I understood correctly, a solution can be by using dplyr:

library(dplyr)

X=data.frame(label1=c(1,1,1,2,2,2,3),  label2=c(2,3,4,1,3,4,4))
Y=data.frame(label=c(1,2,3,4),  COLONY=c(5,5,5,6))


Z=X%>%
  rowwise()%>%
  # check if COLONY of label1 is the same of COLONY of label2
  mutate(SAME.COLONY = (Y[Y$label == label1, ]$COLONY == Y[Y$label == label2, ]$COLONY)
         )%>%
  # get COLONY if True
  mutate(COLONY=ifelse(SAME.COLONY==T,Y[Y$label==label1,]$COLONY,NA))

Z

# > Z
# A tibble: 7 x 4
# Rowwise: 
# label1 label2 SAME.COLONY COLONY
# <dbl>  <dbl> <lgl>        <dbl>
#   1      1      2 TRUE            5
# 2      1      3 TRUE             5
# 3      1      4 FALSE           NA
# 4      2      1 TRUE             5
# 5      2      3 TRUE             5
# 6      2      4 FALSE           NA
# 7      3      4 FALSE           NA
  • Related