Home > Software engineering >  How to recode multiple columns of one dataframe based on a column from another dataframe?
How to recode multiple columns of one dataframe based on a column from another dataframe?

Time:10-30

I want to create new columns in df1 based on two columns in df2. df1 has multiple columns ("x", "y" , "z"), with possible values 0,1, or NA. df2 has two columns: "a" and "b". The values of column "b" include the column names of df1 (although some values in df2$a may be present as columns in df1).

x <- c(1, 1, NA, NA, 0)
y <- c(0, 1, 1, NA, NA)
z <- c(1, 1, 0, 1, 1)

df1 <- data.frame(x, y, z)

a <- c( "Green", "Green", "Green",  "Red", "Red",   "Blue", "Blue", "Yellow")
b <- c(  "w",       "x",    "y",     "x",    "z",     "w" ,   "y",    "z"   )

df2 <- data.frame(a, b)

The new columns to be created in df1 should be named as the values of column df2$a. The value of each new column (i.e df1$Green) should be "1" when

  • the values of df2$a and df2$b are matched (example: df2$a = "Green" matches df2$b = "w", "x" and "y"), and
  • the corresponding column in df1 (df1$x and df$y) equals "1".

The new_df1 should be:

new_df1

   x  y z Green Red Blue Yellow
1  1  0 1     1   1    0      1
2  1  1 1     1   1    1      1
3 NA  1 0     1   0    1      0
4 NA NA 1     0   1    0      1
5  0 NA 1     0   1    0      1

CodePudding user response:

In pure base R:


x <- c(1, 1, NA, NA, 0)
y <- c(0, 1, 1, NA, NA)
z <- c(1, 1, 0, 1, 1)

df1 <- data.frame(x, y, z)

a <- c( "Green", "Green", "Green",  "Red", "Red",   "Blue", "Blue", "Yellow")
b <- c(  "w",       "x",    "y",     "x",    "z",     "w" ,   "y",    "z"   )
df2 <- data.frame(a, b)

# table of counts of colors/letters
d <- as.data.frame(t(unclass(table(df2))))

cols <- (t(apply(df1, 1, \(x) apply(d[colnames(df1)[as.logical(x)],],2,\(x) as.numeric(any(x))))))

#if you do not want NA's
cols[is.na(cols)] <- 0

# your desired order (can also use cbind)
df1[,c("Green", "Red","Blue","Yellow")] <- cols[,c("Green", "Red","Blue","Yellow")]

df1
#>    x  y z Green Red Blue Yellow
#> 1  1  0 1     1   1    0      1
#> 2  1  1 1     1   1    1      1
#> 3 NA  1 0     1   0    1      0
#> 4 NA NA 1     0   1    0      1
#> 5  0 NA 1     0   1    0      1

d is a table of count of colors in columns, letters as rownames. First apply searches for every row in df1 the rows -letters- in d that are 1 in df1. Second apply uses any to find in d ocurrence of any "1" for every color column. Note that \(x) is shorthand for function(x)

CodePudding user response:

With split rowSums:

df3 <- 
  split(df2, df2$a) |>
  sapply(\(x)  (rowSums(df1[intersect(colnames(df1), x$b)], na.rm = TRUE) > 0))

> cbind.data.frame(df1, df3)
#    x  y z Blue Green Red Yellow
# 1  1  0 1    0     1   1      1
# 2  1  1 1    1     1   1      1
# 3 NA  1 0    1     1   0      0
# 4 NA NA 1    0     0   1      1
# 5  0 NA 1    0     0   1      1
  • Related