If I have the following data structure in my data frame df:
a b c d
1 2 3 4
NA NA 1 2
NA 1 2 NA
NA NA NA 1
how can I create a variable that tells me, which of the variables is the first one to not have a missing value, such that:
a b c d var
1 2 3 4 a
NA NA 1 2 c
NA 1 2 NA b
NA NA NA 1 d
I need the code to work with variable names and not column numbers, because of the size of the dataset and changing the order of the variables.
I have tried:
df <- df %>% mutate(var = coalesce(deparse(substitute(a)), deparse(substitute(b)), deparse(substitute(c)), deparse(substitute(d))))
and
df <- df %>% mutate(var = deparse(substitute(do.call(coalesce, across(c(a, b, c, d))))))
trying to implement this approach. I got the code to extract the string of a variable name from: How to convert variable (object) name into String
CodePudding user response:
We can use max.col
, i.e.
names(df)[max.col(!is.na(df), ties.method = 'first')]
#[1] "a" "c" "b" "d"
Via dplyr
,
library(dplyr)
df %>%
mutate(var = names(.)[max.col(!is.na(.), ties.method = 'first')])
a b c d var
1 1 2 3 4 a
2 NA NA 1 2 c
3 NA 1 2 NA b
4 NA NA NA 1 d
You can specify variables
df %>%
mutate(var = names(.[c('a', 'b')])[max.col(!is.na(.[c('a', 'b')]), ties.method = 'first')])
a b c d var
1 1 2 3 4 a
2 NA NA 1 2 a
3 NA 1 2 NA b
4 NA NA NA 1 a
You can also change the order of the variables via select()
, i.e.
df %>%
select(c, d, b, a) %>%
mutate(new = names(.)[max.col(!is.na(.), ties.method = 'first')])
c d b a new
1 3 4 2 1 c
2 1 2 NA NA c
3 2 NA 1 NA c
4 NA 1 NA NA d
You can also call select()
again to restore the original order of columns but maintain the result from previous order, i.e.
df %>%
select(c, d, b, a) %>%
mutate(new = names(.)[max.col(!is.na(.), ties.method = 'first')]) %>%
select(names(df), new)
a b c d new
1 1 2 3 4 c
2 NA NA 1 2 c
3 NA 1 2 NA c
4 NA NA NA 1 d
To maintain all the variables at the end, you can join on the original data frame, i.e.
df %>%
select(c, d, b) %>%
mutate(new = names(.)[max.col(!is.na(.), ties.method = 'first')]) %>%
left_join(df) %>%
select(names(df), new)
Joining, by = c("c", "d", "b")
a b c d new
1 1 2 3 4 c
2 NA NA 1 2 c
3 NA 1 2 NA c
4 NA NA NA 1 d
CodePudding user response:
You can do
df %>% mutate(var = apply(., 1, \(x) names(which(!is.na(x)))[1]))
#> a b c d var
#> 1 1 2 3 4 a
#> 2 NA NA 1 2 c
#> 3 NA 1 2 NA b
#> 4 NA NA NA 1 d