Home > other >  aggregate toString ignoring NA values / Concatenate rows including NAs
aggregate toString ignoring NA values / Concatenate rows including NAs

Time:01-17

Aim is to concatenate rows toString (which contain NAs) based on unique identifiers

id     year    cat_1       cat_2
001    2021    Too high    NA
001    2021    YOY error   YOY error
002    2021    Too high    Too low    
002    2021    NA          YOY error
003    2021    Too high    NA
003    2021    YOY error   NA

Looking for a more efficient solution than:

df <- df %>% group_by(id, year) %>% summarise(across(everything(), toString, na.rm = TRUE))

Which results in NA being concatenated into the string

id     year    cat_1                  cat_2
001    2021    Too high, YOY error    NA, YOY error
002    2021    Too high, NA           Too low, YOY error  
003    2021    Too high, YOY error    NA, NA

And then replacing the string NAs with blanks and blanks with NA:

df[] <- lapply(df, gsub, pattern = "NA, ", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ", NA", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = "NA", replacement = "", fixed = TRUE)
df[df==""] <- NA

I assume I am misuing na.rm when using summarise? Alternatively is there a different method?

CodePudding user response:

df %>%
  group_by(id, year) %>%
  summarise(across(everything(), ~toString(na.omit(.x))))

# A tibble: 3 x 4
# Groups:   id [3]
     id  year cat_1               cat_2               
  <int> <int> <chr>               <chr>               
1     1  2021 Too high, YOY error "YOY error"         
2     2  2021 Too high            "Too low, YOY error"
3     3  2021 Too high, YOY error ""                  

Base R:

aggregate(.~id   year, df, \(x)toString(na.omit(x)), na.action = identity)

  id year               cat_1              cat_2
1  1 2021 Too high, YOY error          YOY error
2  2 2021            Too high Too low, YOY error
3  3 2021 Too high, YOY error  
  •  Tags:  
  • Related