Home > Software design >  How to replace na in a column with the first non-missing value without dropping cases that only have
How to replace na in a column with the first non-missing value without dropping cases that only have

Time:03-22

I have a long data frame that has many NAs, but I want to condenses it so all NAs are filled with the first non-missing value when grouped by a variable--but if the observation only has NAs, it keeps it. Until I updated R, I had a code that worked (shown below), but now it deletes rows if one of their columns is all NAs.

Here's a sample dataset:

library(dplyr)

test <- tibble(name = c("J", "C", "J", "C"),
               test_1 = c(1:2, NA, NA),
               test_2 = c(NA, NA, 3:4),
               make_up_test = c(NA, 1, NA, NA))

And here's what used to work--but now deletes observations that only have NAs in one column (see J getting dropped because he only has NAs for test observation)


test %>%
  group_by(name) %>%
  summarise_all(~first(na.omit(.)))

This is what I'm hoping to get:

solution <- tibble(name = c("J", "C"),
                test_1 = c(1:2),
                test_2 = c(3:4),
                make_up_test = c(NA, 1))

CodePudding user response:

We remove the NA with na.omit and get the first element - use [1] to coerce to NA if there are no non-NA elements present

library(dplyr)
test %>% 
  group_by(name) %>% 
  summarise(across(everything(), ~ first(na.omit(.x))[1]))

-output

# A tibble: 2 × 4
  name  test_1 test_2 make_up_test
  <chr>  <int>  <int>        <dbl>
1 C          2      4            1
2 J          1      3           NA

CodePudding user response:

Here is an approach with pivoting:

library(tidyr)
library(dplyr)

test %>% 
  pivot_longer(-name, names_to = "names") %>%  
  drop_na() %>% 
  pivot_wider(names_from = names, values_from = value) %>% 
  relocate(test_2, .after = test_1)
  name  test_1 test_2 make_up_test
  <chr>  <dbl>  <dbl>        <dbl>
1 J          1      3           NA
2 C          2      4            1
  • Related