Home > Software engineering >  R aggregating list items within a dataframe grouped by another column
R aggregating list items within a dataframe grouped by another column

Time:07-05

I have a dataframe df that looks like the following:

df<-structure(list(hex = c(7L, 7L, 5L, 7L, 5L, 5L, 5L, 3L, 5L, 7L
), material_diff = list(c(0, 0, -1, 0, 0, 0), c(0, 0, -1, 0, 
0, 0), c(0, 0, -1, 0, 0, 0), c(0, 0, -1, 0, 0, 0), c(0, 0, -1, 
0, 0, 0), c(0, 0, -1, 0, 0, 0), c(0, 0, -1, 0, 0, 0), c(0, 0, 
0, 0, -0.166666666666667, 0), c(0, 0, -1, 0, 0, 0), c(0, 0, -1, 
0, 0, 0))), class = "data.frame", row.names = c(NA, -10L))

   hex                                                     material_diff
1    7                                                 0, 0, -1, 0, 0, 0
2    7                                                 0, 0, -1, 0, 0, 0
3    5                                                 0, 0, -1, 0, 0, 0
4    7                                                 0, 0, -1, 0, 0, 0
5    5                                                 0, 0, -1, 0, 0, 0
6    5                                                 0, 0, -1, 0, 0, 0
7    5                                                 0, 0, -1, 0, 0, 0
8    3 0.0000000, 0.0000000, 0.0000000, 0.0000000, -0.1666667, 0.0000000
9    5                                                 0, 0, -1, 0, 0, 0
10   7                                                 0, 0, -1, 0, 0, 0

I want to sum the vectors in material_diff and group by hex to return the following:

   hex                                                     material_diff
1    3   0.0000000, 0.0000000, 0.0000000, 0.0000000, -0.1666667, 0.0000000
2    5                                                 0, 0, -5, 0, 0, 0
3    7                                                 0, 0, -4, 0, 0, 0

How might achieve this?

CodePudding user response:

You may take help of Reduce -

library(dplyr)

df %>%
  group_by(hex) %>%
  summarise(material_diff = list(Reduce(` `, material_diff))) %>%
  data.frame() #for better viewing. 

#  hex                                                     material_diff
#1   3 0.0000000, 0.0000000, 0.0000000, 0.0000000, -0.1666667, 0.0000000
#2   5                                                 0, 0, -5, 0, 0, 0
#3   7                                                 0, 0, -4, 0, 0, 0

CodePudding user response:

Here is a base R way with by.

res <- data.frame(hex = sort(unique(df$hex)))
res$material_diff <- by(seq_along(df$material_diff), df$hex, \(i) {
  x <- do.call(rbind, df$material_diff[i])
  colSums(x)
})
res
#  hex                                                     material_diff
#1   3 0.0000000, 0.0000000, 0.0000000, 0.0000000, -0.1666667, 0.0000000
#2   5                                                 0, 0, -5, 0, 0, 0
#3   7                                                 0, 0, -4, 0, 0, 0
  •  Tags:  
  • r
  • Related