I have a data frame in R with four variables:
id | var1 | var2 | var3 |
---|---|---|---|
1 | NA | 0.4 | NA |
1 | 0.8 | NA | NA |
2 | 0.7 | NA | NA |
2 | NA | 0.5 | NA |
2 | NA | NA | 0.1 |
3 | NA | 0.5 | NA |
3 | NA | NA | 0.2 |
There are repeated entries per id and each observation only contains one data value besides the id. I would like to obtain one observation per id with all of the data values "collected". The output should look like this:
id | var1 | var2 | var3 |
---|---|---|---|
1 | 0.8 | 0.4 | NA |
2 | 0.7 | 0.5 | 0.1 |
3 | NA | 0.5 | 0.2 |
I have played around with pivot_wider, data.table, gather, but am not getting anywhere. It seems to me that this should be very simple. Like some sort of collapse. Grateful for any pointers.
CodePudding user response:
Or using summarise
per group:
library(dplyr)
df |>
group_by(id) |>
summarise(across(everything(), ~ first(na.omit(.))))
Output:
# A tibble: 3 × 4
id var1 var2 var3
<int> <dbl> <dbl> <dbl>
1 1 0.8 0.4 NA
2 2 0.7 0.5 0.1
3 3 NA 0.5 0.2
Thanks to @Darren Tsai for the data.
CodePudding user response:
You can use fill
by group and then subset unique rows.
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
fill(var1:var3, .direction = "downup") %>%
distinct() %>%
ungroup()
# # A tibble: 3 × 4
# id var1 var2 var3
# <int> <dbl> <dbl> <dbl>
# 1 1 0.8 0.4 NA
# 2 2 0.7 0.5 0.1
# 3 3 NA 0.5 0.2
Data
df <- read.table(text = "
id var1 var2 var3
1 NA 0.4 NA
1 0.8 NA NA
2 0.7 NA NA
2 NA 0.5 NA
2 NA NA 0.1
3 NA 0.5 NA
3 NA NA 0.2", header = TRUE)
CodePudding user response:
You can first pivot_longer
, then remove NA
, and finally pivot_wider
back again:
library(tidyverse)
df %>%
pivot_longer(-id) %>%
na.omit() %>%
pivot_wider(names_from = name, values_from = value)
# A tibble: 3 × 4
id var2 var1 var3
<dbl> <dbl> <dbl> <dbl>
1 1 0.4 0.8 NA
2 2 0.5 0.7 0.1
3 3 0.5 NA 0.2