I have a long data frame that has many NAs, but I want to condenses it so all NAs are filled with the first non-missing value when grouped by a variable--but if the observation only has NAs, it keeps it. Until I updated R, I had a code that worked (shown below), but now it deletes rows if one of their columns is all NAs.
Here's a sample dataset:
library(dplyr)
test <- tibble(name = c("J", "C", "J", "C"),
test_1 = c(1:2, NA, NA),
test_2 = c(NA, NA, 3:4),
make_up_test = c(NA, 1, NA, NA))
And here's what used to work--but now deletes observations that only have NAs in one column (see J getting dropped because he only has NAs for test observation)
test %>%
group_by(name) %>%
summarise_all(~first(na.omit(.)))
This is what I'm hoping to get:
solution <- tibble(name = c("J", "C"),
test_1 = c(1:2),
test_2 = c(3:4),
make_up_test = c(NA, 1))
CodePudding user response:
We remove the NA
with na.omit
and get the first
element - use [1]
to coerce to NA
if there are no non-NA elements present
library(dplyr)
test %>%
group_by(name) %>%
summarise(across(everything(), ~ first(na.omit(.x))[1]))
-output
# A tibble: 2 × 4
name test_1 test_2 make_up_test
<chr> <int> <int> <dbl>
1 C 2 4 1
2 J 1 3 NA
CodePudding user response:
Here is an approach with pivoting:
library(tidyr)
library(dplyr)
test %>%
pivot_longer(-name, names_to = "names") %>%
drop_na() %>%
pivot_wider(names_from = names, values_from = value) %>%
relocate(test_2, .after = test_1)
name test_1 test_2 make_up_test
<chr> <dbl> <dbl> <dbl>
1 J 1 3 NA
2 C 2 4 1