I have the following two datasets:
df1 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5),
"numbers" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
"class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"))
df2 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5),
"P1" = c(55, NA, 60, 65, NA, NA, 55, 43, 60),
"P2" = c(55, 75, 65, 60, NA, 33, 55, NA, 75),
"P3" = c(75, 55, 60, 65, 33, 32, 43, 55, 70))
I want to check whether the values in the column "numbers" in df1 are contained in the columns P1, P2, and P3 of df2. There are two problems I am stuck with. 1. the values in the numbers column of df1 can occur in different groups in df2 (defined by the group column in df1 and df2). 2. the datasets have different lengths. Is there a way to merge both datasets and have the following dataset:
df3 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5),
"number" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
"class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"),
"P1new" = c(1, 0, 1, 1, 0, 0, 1, 1, 0, 0),
"P2new" = c(1, 1, 1, 1, 0, 1, 1, 0, 1, 0),
"P3new" = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1)))
where P1new (P2new and P3new respectively) contain the value 1 if df2$P1 contains the value in df1$numbers within the correct group (as I said numbers can reoccur in different groups). For example, P3 has the value 75 in group 1 but not in group 5. So in group 1 P3new would have a 1 and in group 5 P3new would have a 0. I would really appreciate any help.
CodePudding user response:
Here is a data.table
approach.
There are some incomplete cases (class does not appear in df1). You can easily drop them if you want...
library(data.table)
# Convert to data.table
setDT(df1); setDT(df2)
# Melt df2 to long format
df2.melt <- melt(df2, id.vars = "group", na.rm = TRUE)
# Join classes from df1 to df2.melt
df2.melt[df1, class := i.class, on = .(group, value = numbers)][]
# Cast back to wide format
dcast(df2.melt, group value class ~ paste0(variable, "New"), fun.aggregate = length)
# group value class P1New P2New P3New
# 1: 1 55 b 1 1 1
# 2: 1 75 b 0 1 1
# 3: 2 33 <NA> 0 0 1
# 4: 2 60 a 1 1 1
# 5: 2 65 c 1 1 1
# 6: 3 32 <NA> 0 0 1
# 7: 3 33 b 0 1 0
# 8: 3 43 <NA> 0 0 1
# 9: 3 55 a 1 1 0
#10: 4 43 c 1 0 0
#11: 4 55 <NA> 0 0 1
#12: 5 60 <NA> 1 0 0
#13: 5 70 b 0 0 1
#14: 5 75 a 0 1 0
addition based on comment below
dcast(df2.melt, group value class ~ paste0(variable, "New"), value.var = "value")
# group value class P1New P2New P3New
# 1: 1 55 b 55 55 55
# 2: 1 75 b NA 75 75
# 3: 2 33 <NA> NA NA 33
# 4: 2 60 a 60 60 60
# 5: 2 65 c 65 65 65
# 6: 3 32 <NA> NA NA 32
# 7: 3 33 b NA 33 NA
# 8: 3 43 <NA> NA NA 43
# 9: 3 55 a 55 55 NA
#10: 4 43 c 43 NA NA
#11: 4 55 <NA> NA NA 55
#12: 5 60 <NA> 60 NA NA
#13: 5 70 b NA NA 70
#14: 5 75 a NA 75 NA