Home > Enterprise >  Cumulative Sum of String
Cumulative Sum of String

Time:02-02

I have a table which looks like:

Order Col A Col B
1 a 2,3,4,5
2 a 3,5,6,7,8
3 a 1,2,4,9
4 a 3,5,7,11,12

I want to aggregate this table by Col A. The output should look like the following:

Order Col A Col B Col C
1 a 2,3,4,5 2,3,4,5
2 a 3,5,6,7,8 2,3,4,5,6,7,8
3 a 1,2,4,9 1,2,3,4,5,6,7,8,9
4 a 3,5,7,11,12 1,2,3,4,5,6,7,8,9,11,12

Please guide me on how I get the desirable output in R?

CodePudding user response:

This ought to do it:

library(dplyr)
df %>%
  group_by(ColA) %>%
  mutate(
    result = strsplit(Colb, split = ","),
    result = lapply(result, as.numeric),
    result = Reduce(f = union, x = result, accumulate = TRUE),
    result = lapply(result, sort),
    result = sapply(result, paste, collapse = ",")
  ) %>%
  ungroup()
# # A tibble: 4 × 4
#   Order ColA  Colb        result                 
#   <int> <chr> <chr>       <chr>                  
# 1     1 a     2,3,4,5     2,3,4,5                
# 2     2 a     3,5,6,7,8   2,3,4,5,6,7,8          
# 3     3 a     1,2,4,9     1,2,3,4,5,6,7,8,9      
# 4     4 a     3,5,7,11,12 1,2,3,4,5,6,7,8,9,11,12

Using this data:

df = read.table(text = "Order   ColA    Colb
1   a   '2,3,4,5'
2   a   '3,5,6,7,8'
3   a   '1,2,4,9'
4   a   '3,5,7,11,12' ", header = T)

CodePudding user response:

df %>%
  group_by(ColA)%>%
  mutate(ColC = map_chr(accumulate(strsplit(ColB,','), 
                                   ~union(.x,.y)), str_c, collapse=','))
# A tibble: 4 × 4
# Groups:   ColA [1]
  Order ColA  ColB        ColC                   
  <int> <chr> <chr>       <chr>                  
1     1 a     2,3,4,5     2,3,4,5                
2     2 a     3,5,6,7,8   2,3,4,5,6,7,8          
3     3 a     1,2,4,9     2,3,4,5,6,7,8,1,9      
4     4 a     3,5,7,11,12 2,3,4,5,6,7,8,1,9,11,12
  • Related