I want to create new columns in df1 based on two columns in df2. df1 has multiple columns ("x", "y" , "z"), with possible values 0,1, or NA. df2 has two columns: "a" and "b". The values of column "b" include the column names of df1 (although some values in df2$a may be present as columns in df1).
x <- c(1, 1, NA, NA, 0)
y <- c(0, 1, 1, NA, NA)
z <- c(1, 1, 0, 1, 1)
df1 <- data.frame(x, y, z)
a <- c( "Green", "Green", "Green", "Red", "Red", "Blue", "Blue", "Yellow")
b <- c( "w", "x", "y", "x", "z", "w" , "y", "z" )
df2 <- data.frame(a, b)
The new columns to be created in df1 should be named as the values of column df2$a. The value of each new column (i.e df1$Green) should be "1" when
- the values of df2$a and df2$b are matched (example: df2$a = "Green" matches df2$b = "w", "x" and "y"), and
- the corresponding column in df1 (df1$x and df$y) equals "1".
The new_df1 should be:
new_df1
x y z Green Red Blue Yellow
1 1 0 1 1 1 0 1
2 1 1 1 1 1 1 1
3 NA 1 0 1 0 1 0
4 NA NA 1 0 1 0 1
5 0 NA 1 0 1 0 1
CodePudding user response:
In pure base R:
x <- c(1, 1, NA, NA, 0)
y <- c(0, 1, 1, NA, NA)
z <- c(1, 1, 0, 1, 1)
df1 <- data.frame(x, y, z)
a <- c( "Green", "Green", "Green", "Red", "Red", "Blue", "Blue", "Yellow")
b <- c( "w", "x", "y", "x", "z", "w" , "y", "z" )
df2 <- data.frame(a, b)
# table of counts of colors/letters
d <- as.data.frame(t(unclass(table(df2))))
cols <- (t(apply(df1, 1, \(x) apply(d[colnames(df1)[as.logical(x)],],2,\(x) as.numeric(any(x))))))
#if you do not want NA's
cols[is.na(cols)] <- 0
# your desired order (can also use cbind)
df1[,c("Green", "Red","Blue","Yellow")] <- cols[,c("Green", "Red","Blue","Yellow")]
df1
#> x y z Green Red Blue Yellow
#> 1 1 0 1 1 1 0 1
#> 2 1 1 1 1 1 1 1
#> 3 NA 1 0 1 0 1 0
#> 4 NA NA 1 0 1 0 1
#> 5 0 NA 1 0 1 0 1
d
is a table of count of colors in columns, letters as rownames.
First apply
searches for every row in df1
the rows -letters- in d
that are 1 in df1. Second apply
uses any
to find in d
ocurrence of any "1" for every color column. Note that \(x)
is shorthand for function(x)
CodePudding user response:
With split
rowSums
:
df3 <-
split(df2, df2$a) |>
sapply(\(x) (rowSums(df1[intersect(colnames(df1), x$b)], na.rm = TRUE) > 0))
> cbind.data.frame(df1, df3)
# x y z Blue Green Red Yellow
# 1 1 0 1 0 1 1 1
# 2 1 1 1 1 1 1 1
# 3 NA 1 0 1 1 0 0
# 4 NA NA 1 0 0 1 1
# 5 0 NA 1 0 0 1 1