Home > Back-end >  Iteratively dplyr::coalesce()
Iteratively dplyr::coalesce()

Time:06-03

I have a dataset that I am needing to use dplyr::coalesce() on. But I want to do this multiple times and am not sure about what is a more efficient way of doing this (e.g. loop, apply, etc).

To give you a toy example, say my dataset is:

df = data.frame(
    a = c(1, NA, NA),
    a.1 = c(NA, 1, NA),
    a.2 = c(NA, NA, 1),
    b = c(2, NA, NA),
    b.1 = c(NA, 2, NA),
    b.2 = c(NA, NA, 2),
    c = c(3, NA, NA),
    c.1 = c(NA, 3, NA),
    c.2 = c(NA, NA, 3)
)

And I could do this:

new_df = df |>
    dplyr::mutate(
        a = dplyr::coalesce(a, a.1, a.2),
        b = dplyr::coalesce(b, b.1, b.2),
        c = dplyr::coalesce(c, c.1, c.2)
    ) |>
    dplyr::select(a, b, c)

Which would give me:

new_df
  a b c
1 1 2 3
2 1 2 3
3 1 2 3

First, how could I efficiently do this without having to write coalesce n times? This example here is just an example and I'd really need to do this forty times with the dataset.

Also, is there a way to do it as I have here where I basically just keep a, b, and c rather than naming it as a.1 or whatever?

CodePudding user response:

If columns are like something and somthing.etc shape,

you may try

library(dplyr)
library(stringr)
df %>%
  split.default(str_remove(names(.), "\\..*")) %>%
  map_df(~ coalesce(!!! .x))

      a     b     c
  <dbl> <dbl> <dbl>
1     1     2     3
2     1     2     3
3     1     2     3

CodePudding user response:

Here is an alternative with pivoting:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(everything()) %>% 
  mutate(name = sub("\\..*", "", name)) %>% 
  drop_na %>% 
  pivot_wider(names_from = name, values_from = value, values_fn = list) %>% 
  unnest(cols = c(a, b, c))
      a     b     c
  <dbl> <dbl> <dbl>
1     1     2     3
2     1     2     3
3     1     2     3
  • Related