Hoping to create the new column D based on three existing columns: "A" "B" and "C". The dataset also have other variables E, F, G, etc.
Whenever A or B or C has a value, other two columns have NAs (E,F, G, not affected by them). The new variable "D" I need should import whatever the existing values from any of the A,B, or C columns.
A B C D E F G
1 NA NA 1
NA 2 NA 2
NA 4 NA 4
NA NA 2 2
NA NA 3 3
Any simple codes within any packages that can do the trick? Thank you in advance!
I have seen other codes that can do the work but their datasets only have A,B and C, but my data set has other existing columns, so I need codes that can specify the A, B and C columns.
CodePudding user response:
One option is to use coalesce
on the 'A', 'B', 'C' to create the 'D' - coalesce
will return the column with the first non-NA value per each row
library(dplyr)
df1 <- df1 %>%
mutate(D = coalesce(A, B, C), .after = 'C')
CodePudding user response:
A base R way to do it is to use pmax
:
Data:
df <- data.frame(A = c(1, NA, NA, NA, NA),
B = c(NA, 2, 4, NA, NA),
C = c(NA, NA, NA, 2, 3))
Code:
df$D <- pmax(df$A, df$B, df$C, na.rm = TRUE)
# or
df$D <- with(df, pmax(A, B, C, na.rm = TRUE))
Output:
# A B C D
# 1 1 NA NA 1
# 2 NA 2 NA 2
# 3 NA 4 NA 4
# 4 NA NA 2 2
# 5 NA NA 3 3
CodePudding user response:
Update using across
:
df %>%
mutate(D = rowSums(across(A:C), na.rm = TRUE))
OR
We could use mutate
with rowSums
:
library(dplyr)
df %>%
mutate(D = rowSums(.[1:3], na.rm = TRUE))
A B C D E F G
1 1 NA NA 1 1 1 1
2 NA 2 NA 2 1 1 1
3 NA 4 NA 4 1 1 1
4 NA NA 2 2 1 1 1
5 NA NA 3 3 1 1 1
data:
df <- structure(list(A = c(1L, NA, NA, NA, NA), B = c(NA, 2L, 4L, NA,
NA), C = c(NA, NA, NA, 2L, 3L), D = c(1L, 2L, 4L, 2L, 3L), E = c(1L,
1L, 1L, 1L, 1L), F = c(1L, 1L, 1L, 1L, 1L), G = c(1L, 1L, 1L,
1L, 1L)), class = "data.frame", row.names = c(NA, -5L))