I have a dataset that I am needing to use dplyr::coalesce()
on. But I want to do this multiple times and am not sure about what is a more efficient way of doing this (e.g. loop, apply, etc).
To give you a toy example, say my dataset is:
df = data.frame(
a = c(1, NA, NA),
a.1 = c(NA, 1, NA),
a.2 = c(NA, NA, 1),
b = c(2, NA, NA),
b.1 = c(NA, 2, NA),
b.2 = c(NA, NA, 2),
c = c(3, NA, NA),
c.1 = c(NA, 3, NA),
c.2 = c(NA, NA, 3)
)
And I could do this:
new_df = df |>
dplyr::mutate(
a = dplyr::coalesce(a, a.1, a.2),
b = dplyr::coalesce(b, b.1, b.2),
c = dplyr::coalesce(c, c.1, c.2)
) |>
dplyr::select(a, b, c)
Which would give me:
new_df
a b c
1 1 2 3
2 1 2 3
3 1 2 3
First, how could I efficiently do this without having to write coalesce n times? This example here is just an example and I'd really need to do this forty times with the dataset.
Also, is there a way to do it as I have here where I basically just keep a, b, and c rather than naming it as a.1 or whatever?
CodePudding user response:
If columns are like something
and somthing.etc
shape,
you may try
library(dplyr)
library(stringr)
df %>%
split.default(str_remove(names(.), "\\..*")) %>%
map_df(~ coalesce(!!! .x))
a b c
<dbl> <dbl> <dbl>
1 1 2 3
2 1 2 3
3 1 2 3
CodePudding user response:
Here is an alternative with pivoting:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(everything()) %>%
mutate(name = sub("\\..*", "", name)) %>%
drop_na %>%
pivot_wider(names_from = name, values_from = value, values_fn = list) %>%
unnest(cols = c(a, b, c))
a b c
<dbl> <dbl> <dbl>
1 1 2 3
2 1 2 3
3 1 2 3