Home > Enterprise >  Fill a column with data from other others, conditional on another column in r
Fill a column with data from other others, conditional on another column in r

Time:06-28

I have a df as below columns ID:D, and I would like to create column E. Column E should be conditionally filled by the character data in columns A, B and C based on the instruction from column D.

ID A B C D E
t1 tsg tlm NA 1 tsg
t2 tsg tlm NA 2 tlm
t3 tfp tsl tgg 3 tgg
t4 tsg tfp NA 2 tfp
t5 tlm NA NA 1 tlm
t6 tgg tlm NA 1 tgg

I have been trying to do it like this but it won't work and I can't understand why.

df$E[df$D == 1 ] <- df$A
df$E[df$D == 2 ] <- df$B
df$E[df$D == 3 ] <- df$C

If the solution could be in base R I'd also be very grateful!

CodePudding user response:

If you do it this way you are going to get an error:

Warning message:
In df$E[df$D == 1] <- df$A :
  number of items to replace is not a multiple of replacement length

This is because you are trying to replace df$E[df$D == 1], a vector of length 3, with df$A with the same row indeces.

You need to do:

df$E[df$D == 1 ] <- df$A[df$D == 1 ]
df$E[df$D == 2 ] <- df$B[df$D == 2 ]
df$E[df$D == 3 ] <- df$C[df$D == 3 ]

Alternatively, instead of doing one line per value of df$D, to generalise you could do something like:

col_index  <- df$D 1

df$E  <- sapply(seq_along(df), \(i) df[i, col_index[i]])
df
#   ID   A    B    C D   E
# 1 t1 tsg  tlm <NA> 1 tsg
# 2 t2 tsg  tlm <NA> 2 tlm
# 3 t3 tfp  tsl  tgg 3 tgg
# 4 t4 tsg  tfp <NA> 2 tfp
# 5 t5 tlm <NA> <NA> 1 tlm
# 6 t6 tgg  tlm <NA> 1 tgg

CodePudding user response:

You can use cbind for indexing, i.e.,

> transform(df, E = df[-1][cbind(seq_along(D), D)])
  ID   A    B    C D   E
1 t1 tsg  tlm <NA> 1 tsg
2 t2 tsg  tlm <NA> 2 tlm
3 t3 tfp  tsl  tgg 3 tgg
4 t4 tsg  tfp <NA> 2 tfp
5 t5 tlm <NA> <NA> 1 tlm
6 t6 tgg  tlm <NA> 1 tgg

Data

> dput(df)
structure(list(ID = c("t1", "t2", "t3", "t4", "t5", "t6"), A = c("tsg",
"tsg", "tfp", "tsg", "tlm", "tgg"), B = c("tlm", "tlm", "tsl",
"tfp", NA, "tlm"), C = c(NA, NA, "tgg", NA, NA, NA), D = c(1L,
2L, 3L, 2L, 1L, 1L), E = c("tsg", "tlm", "tgg", "tfp", "tlm",
"tgg")), row.names = c(NA, -6L), class = "data.frame")
  •  Tags:  
  • r
  • Related