Home > Blockchain >  Reshaping data frame with many NAs
Reshaping data frame with many NAs

Time:01-18

I have a data frame in R with four variables:

id var1 var2 var3
1 NA 0.4 NA
1 0.8 NA NA
2 0.7 NA NA
2 NA 0.5 NA
2 NA NA 0.1
3 NA 0.5 NA
3 NA NA 0.2

There are repeated entries per id and each observation only contains one data value besides the id. I would like to obtain one observation per id with all of the data values "collected". The output should look like this:

id var1 var2 var3
1 0.8 0.4 NA
2 0.7 0.5 0.1
3 NA 0.5 0.2

I have played around with pivot_wider, data.table, gather, but am not getting anywhere. It seems to me that this should be very simple. Like some sort of collapse. Grateful for any pointers.

CodePudding user response:

Or using summarise per group:

library(dplyr)

df |>
  group_by(id) |>
  summarise(across(everything(), ~ first(na.omit(.))))

Output:

# A tibble: 3 × 4
     id  var1  var2  var3
  <int> <dbl> <dbl> <dbl>
1     1   0.8   0.4  NA  
2     2   0.7   0.5   0.1
3     3  NA     0.5   0.2

Thanks to @Darren Tsai for the data.

CodePudding user response:

You can use fill by group and then subset unique rows.

library(dplyr)
library(tidyr)

df %>%
  group_by(id) %>%
  fill(var1:var3, .direction = "downup") %>%
  distinct() %>%
  ungroup()

# # A tibble: 3 × 4
#      id  var1  var2  var3
#   <int> <dbl> <dbl> <dbl>
# 1     1   0.8   0.4  NA  
# 2     2   0.7   0.5   0.1
# 3     3  NA     0.5   0.2

Data
df <- read.table(text = "
id  var1    var2    var3
1   NA  0.4 NA
1   0.8 NA  NA
2   0.7 NA  NA
2   NA  0.5 NA
2   NA  NA  0.1
3   NA  0.5 NA
3   NA  NA  0.2", header = TRUE)

CodePudding user response:

You can first pivot_longer, then remove NA, and finally pivot_widerback again:

library(tidyverse)
df %>%
  pivot_longer(-id) %>%
  na.omit() %>%
  pivot_wider(names_from = name, values_from = value)
# A tibble: 3 × 4
     id  var2  var1  var3
  <dbl> <dbl> <dbl> <dbl>
1     1   0.4   0.8  NA  
2     2   0.5   0.7   0.1
3     3   0.5  NA     0.2
  • Related