Home > database >  Row-wise coalesce over all columns
Row-wise coalesce over all columns

Time:02-01

How can we get first non-missing value - coalesce - row-wise using dplyr (tidyverse) for all columns without specifying column names?

Example data:

df <- data.frame(x = c(NA, "s3", NA, NA,"s4"),
                 y = c("s1", NA, "s6", "s7", "s4"),
                 z = c("s1", NA, NA, "s7", NA))

We could use do.call, but this does not look tidy-like:

df$xyz <- do.call(coalesce, df)
#      x    y    z xyz
# 1 <NA>   s1   s1  s1
# 2   s3 <NA> <NA>  s3
# 3 <NA>   s6 <NA>  s6
# 4 <NA>   s7   s7  s7
# 5   s4   s4 <NA>  s4

This works, but I don't want to specify columns:

df %>% 
  mutate(xyz = coalesce(x, y, z))
#      x    y    z xyz
# 1 <NA>   s1   s1  s1
# 2   s3 <NA> <NA>  s3
# 3 <NA>   s6 <NA>  s6
# 4 <NA>   s7   s7  s7
# 5   s4   s4 <NA>  s4

Similar to data.table:

library(data.table)
setDT(df)[, xyz := fcoalesce(.SD) ][]
#       x    y    z xyz
# 1: <NA>   s1   s1  s1
# 2:   s3 <NA> <NA>  s3
# 3: <NA>   s6 <NA>  s6
# 4: <NA>   s7   s7  s7
# 5:   s4   s4 <NA>  s4

Failed attempts:

df %>% 
  mutate(xyz = coalesce(all_vars()))

df %>% 
  mutate(xyz = coalesce(c_across(all_vars())))

df %>% 
  rowwise() %>% 
  mutate(xyz = coalesce(all_vars()))

df %>% 
  rowwise() %>% 
  mutate(xyz = coalesce(c_across(all_vars())))

Any ideas?

CodePudding user response:

Taken from this GitHub discussion, you can create a coacross function:

coacross <- function(...) {
  coalesce(!!!across(...))
}

df %>% 
  mutate(xyz = coacross(everything()))

     x    y    z xyz
1 <NA>   s1   s1  s1
2   s3 <NA> <NA>  s3
3 <NA>   s6 <NA>  s6
4 <NA>   s7   s7  s7
5   s4   s4 <NA>  s4

CodePudding user response:

We can inject the data frame to coalesce using the splice operator !!!.

library(dplyr)

df %>% mutate(xyz = coalesce(!!!df))

Or more "tidyverse" like:

df %>% mutate(xyz = coalesce(!!!select(., everything())))

Output

     x    y    z xyz
1 <NA>   s1   s1  s1
2   s3 <NA> <NA>  s3
3 <NA>   s6 <NA>  s6
4 <NA>   s7   s7  s7
5   s4   s4 <NA>  s4

CodePudding user response:

This is apossible solution:

df %>%
    mutate(xyz = do.call(coalesce,across()))

#>     x    y    z xyz
#> 1 <NA>   s1   s1  s1
#> 2   s3 <NA> <NA>  s3
#> 3 <NA>   s6 <NA>  s6
#> 4 <NA>   s7   s7  s7
#> 5   s4   s4 <NA>  s4
  • Related