I have a dataframe that takes similar form as the toy dataframe below. I would like to merge the rows if var1, var2, and var3 are all equal values, creating a combination of data in the merged rows. For rows 4 - 6, where there are different values in the rows, I was wondering if there is a way to put them in the same column with a seperator in between.
df <- data.frame(var1 = c("1635", "1635", "1729", "1847", "1847", "1847"),
var2 = c("Aa", "Aa", "Bb", "Cc", "Cc", "Cc"),
var3 = c("28", "28", "85", "27", "27", "27"),
var4 = c("apple", NA, "orange", "pear", NA, NA),
var5 = c(NA, "tree", NA, NA, "ground", "desk")
)
So the output would look something like this:
CodePudding user response:
in base R you would do:
aggregate(.~var1 var2 var3, df, \(x)toString(unique(na.omit(x))), na.action = identity)
var1 var2 var3 var4 var5
1 1847 Cc 27 pear ground, desk
2 1635 Aa 28 apple tree
3 1729 Bb 85 orange
in tidyverse:
library(tidyverse)
df %>%
group_by(var1,var2,var3) %>%
summarize(across(var4:var5, ~toString(unique(na.omit(.x)))),.groups = 'drop')
# Groups: var1, var2 [3]
var1 var2 var3 var4 var5
<chr> <chr> <chr> <chr> <chr>
1 1635 Aa 28 apple "tree"
2 1729 Bb 85 orange ""
3 1847 Cc 27 pear "ground, desk"
CodePudding user response:
With dplyr
, you can group_by
the three columns, then use summarize
to concatenate the strings if they are not NA
.
library(dplyr)
df %>%
group_by(var1, var2, var3) %>%
summarize(across(var4:var5, ~ifelse(all(is.na(.x)), NA, paste0(na.omit(.x), collapse = ","))), .groups = "drop")
# A tibble: 3 × 5
var1 var2 var3 var4 var5
<chr> <chr> <chr> <chr> <chr>
1 1635 Aa 28 apple tree
2 1729 Bb 85 orange NA
3 1847 Cc 27 pear ground,desk
CodePudding user response:
With data.table
:
setDT(df)
df[,
lapply(.SD, \(x) if (all(is.na(x))) NA_character_ else paste(na.omit(x), collapse = "; ")),
by = var1:var3]
setDF(df)
# var1 var2 var3 var4 var5
# <char> <char> <char> <char> <char>
# 1: 1635 Aa 28 apple tree
# 2: 1729 Bb 85 orange <NA>
# 3: 1847 Cc 27 pear ground; desk