Home > front end >  Creating a new column based on several exiting columns
Creating a new column based on several exiting columns

Time:05-18

Hoping to create the new column D based on three existing columns: "A" "B" and "C". The dataset also have other variables E, F, G, etc.

Whenever A or B or C has a value, other two columns have NAs (E,F, G, not affected by them). The new variable "D" I need should import whatever the existing values from any of the A,B, or C columns.

 A    B     C       D    E  F  G
 1    NA    NA      1
 NA   2     NA      2
 NA   4     NA      4
 NA   NA    2       2
 NA   NA    3       3
 

Any simple codes within any packages that can do the trick? Thank you in advance!

I have seen other codes that can do the work but their datasets only have A,B and C, but my data set has other existing columns, so I need codes that can specify the A, B and C columns.

CodePudding user response:

One option is to use coalesce on the 'A', 'B', 'C' to create the 'D' - coalesce will return the column with the first non-NA value per each row

library(dplyr)
df1 <- df1 %>%
  mutate(D = coalesce(A, B, C), .after = 'C')

CodePudding user response:

A base R way to do it is to use pmax:

Data:

df <- data.frame(A = c(1, NA, NA, NA, NA),
                 B = c(NA, 2, 4, NA, NA),
                 C = c(NA, NA, NA, 2, 3))

Code:

df$D <- pmax(df$A, df$B, df$C, na.rm = TRUE)

# or

df$D <- with(df, pmax(A, B, C, na.rm = TRUE))

Output:

#    A  B  C D
# 1  1 NA NA 1
# 2 NA  2 NA 2
# 3 NA  4 NA 4
# 4 NA NA  2 2
# 5 NA NA  3 3

CodePudding user response:

Update using across:

df %>% 
  mutate(D = rowSums(across(A:C), na.rm = TRUE))

OR

We could use mutate with rowSums:

library(dplyr)

df %>% 
  mutate(D = rowSums(.[1:3], na.rm = TRUE))
   A  B  C D E F G
1  1 NA NA 1 1 1 1
2 NA  2 NA 2 1 1 1
3 NA  4 NA 4 1 1 1
4 NA NA  2 2 1 1 1
5 NA NA  3 3 1 1 1

data:

df <- structure(list(A = c(1L, NA, NA, NA, NA), B = c(NA, 2L, 4L, NA, 
NA), C = c(NA, NA, NA, 2L, 3L), D = c(1L, 2L, 4L, 2L, 3L), E = c(1L, 
1L, 1L, 1L, 1L), F = c(1L, 1L, 1L, 1L, 1L), G = c(1L, 1L, 1L, 
1L, 1L)), class = "data.frame", row.names = c(NA, -5L))
  •  Tags:  
  • r
  • Related