How can I make a custom aggregation of a dataframe in R?-CodePudding

I have a dataframe such as

group <- c("A", "A", "B", "C", "C")
tx <- c("A-201", "A-202", "B-201", "C-205", "C-206")
feature <- c("coding", "decay", "pending", "coding", "coding")
df <- data.frame(group, tx, feature)

I want to generate a new df with the entries in tx "listed" for each feature. I want the output to look like

group <- c("A", "B", "C")
coding <- c("A-201", NA, "C-205|C-206")
decay <- c("A-202", NA, NA)
pending <- c(NA, "B-201", NA)
df.out <- data.frame(group, coding, decay, pending)

So far I did not find a means to achieve this via a dplyr function. Do I have to loop through my initial df?

CodePudding user response：

You may get the data in wide format using tidyr::pivot_wider and use a function in values_fn -

df.out <- tidyr::pivot_wider(df, names_from = feature, values_from = tx, 
         values_fn = function(x) paste0(x, collapse = '|'))

df.out

# group coding      decay pending
#  <chr> <chr>       <chr> <chr>  
#1 A     A-201       A-202 NA     
#2 B     NA          NA    B-201  
#3 C     C-205|C-206 NA    NA

CodePudding user response：

Here is an alternative way:

library(dplyr)
library(tidyr)

df %>% 
  group_by(group, feature) %>% 
  mutate(tx = paste(tx, collapse = "|")) %>% 
  distinct() %>% 
  pivot_wider(
    names_from = feature, 
    values_from = tx
  )

  group coding      decay pending
  <chr> <chr>       <chr> <chr>  
1 A     A-201       A-202 NA     
2 B     NA          NA    B-201  
3 C     C-205|C-206 NA    NA

CodePudding user response：

Using dcast from data.table

library(data.table)
dcast(setDT(df), group ~ feature, value.var = 'tx', 
   function(x) paste(x, collapse = "|"), fill = NA)
   group      coding decay pending
1:     A       A-201 A-202    <NA>
2:     B        <NA>  <NA>   B-201
3:     C C-205|C-206  <NA>    <NA>