Join similar observations within a data.frame with R-CodePudding

I want to mix several observations in a data.frame using as a reference one constantly repeated variable.

Example:

id  var1    var2    var3
a   1   na  na
a   na  2   na
a   na  na  3
b   1       na
b   na  2   na
b   na  na  na
c   na  na  3
c   na  2   na
c   1   na  na

Expected result:

id  var1    var2    var3
a   1   2   3
b   1   2   na
c   1   2   3

CodePudding user response：

A possible solution (replacing "na" by NA with na_if):

library(tidyverse)

df %>% 
  na_if("na") %>% 
  group_by(id) %>% 
  summarize(across(var1:var3, ~ sort(.x)[1]))

#> # A tibble: 3 × 4
#>   id    var1  var2  var3 
#>   <chr> <chr> <chr> <chr>
#> 1 a     1     2     3    
#> 2 b     1     2     <NA> 
#> 3 c     1     2     3

CodePudding user response：

Assumptions:

"na" above is really R's native NA (not a string);
b's first row, var2 should be NA instead of an empty string ""
perhaps from the above, var1:var3 should be numbers
either you will never have a group where there is more than one non-NA in a group/column, or you don't care about anything other than the first and want the remaining discarded

library(dplyr)
dat %>%
  group_by(id) %>%
  summarize(across(everything(), ~ na.omit(.)[1]))
# # A tibble: 3 x 4
#   id     var1  var2  var3
#   <chr> <int> <int> <int>
# 1 a         1     2     3
# 2 b         1     2    NA
# 3 c         1     2     3

Data

dat <- structure(list(id = c("a", "a", "a", "b", "b", "b", "c", "c", "c"), var1 = c(1L, NA, NA, 1L, NA, NA, NA, NA, 1L), var2 = c(NA, 2L, NA, NA, 2L, NA, NA, 2L, NA), var3 = c(NA, NA, 3L, NA, NA, NA, 3L, NA, NA)), class = "data.frame", row.names = c(NA, -9L))

CodePudding user response：

Assuming that your data has NA, you can use the following base R option using the Data from @r2evans (thanks!):

aggregate(.~id, dat, mean, na.rm = TRUE, na.action=NULL)

Output:

  id var1 var2 var3
1  a    1    2    3
2  b    1    2  NaN
3  c    1    2    3