R: Sum Two Lists Embedded in Rows of a Dataframe by Groups-CodePudding

I have data on groups of bonds and their cash flows. Each IssueID contains multiple BondIDs. Each BondID has its own cash flow, rendered as a list within the data frame via dplyr mutate. The cash flow lists do not have equal numbers of elements. The data structure is:

IssueID	BondID	cashflow
AA	AA1	c(-1000, 50, 50, 1050)
AA	AA2	c(-1000, 25, 25, 25, 25, 1025)
AB	AB1	c(-2000, 100, 100, 2100)
AB	AB1	c(-1000, 75, 75, 75, 75, 1075)

I need to sum each BondID's cashflows by IssueID while maintaining each element's position in the list. The output needs to look like:

IssueID	sumcashflow
AA	c(-2000, 75, 75, 1075, 25, 1025)
AB	c(-3000, 175, 175, 2175, 75, 1075)

Appreciate any assistance. Thanks.

Am not able to sum the lists.

CodePudding user response：

Here is one option - grouped by 'IssueID', convert the cashflow list to a named list with sequence (row_number()), then convert to a tibble using enframe, unnest the list column, create a grouping by the rowid of the 'name' (in case there are unequal lengths), and get the sum of 'value', pull the column as a list

library(dplyr)
library(tibble)
library(tidyr)
library(data.table)
out <- df1 %>%
  group_by(IssueID) %>%
  summarise(sumcashflow = setNames(cashflow, row_number()) %>%
         enframe %>%
         unnest(value) %>%
         group_by(grp = rowid(name)) %>%
         summarise(value = sum(value, na.rm = TRUE)) %>%
         pull(value) %>%
        list(.))

-output

> out$sumcashflow
[[1]]
[1] -2000    75    75  1075    25  1025

[[2]]
[1] -3000   175   175  2175    75  1075

> out
# A tibble: 2 × 2
  IssueID sumcashflow
  <chr>   <list>     
1 AA      <dbl [6]>  
2 AB      <dbl [6]>

or using base R with split

lst1 <- lapply(split(df1$cashflow, df1$IssueID), \(x) {
      mx <- max(lengths(x))
     rowSums(sapply(x, `length<-`, mx), na.rm = TRUE)
})

> lst1
$AA
[1] -2000    75    75  1075    25  1025

$AB
[1] -3000   175   175  2175    75  1075

data

df1 <- structure(list(IssueID = c("AA", "AA", "AB", "AB"), BondID = c("AA1", 
"AA2", "AB1", "AB1"), cashflow = list(c(-1000, 50, 50, 1050), 
    c(-1000, 25, 25, 25, 25, 1025), c(-2000, 100, 100, 2100), 
    c(-1000, 75, 75, 75, 75, 1075))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response：

df1 %>%
  group_by(IssueID) %>%
  summarise(val = list(colSums(do.call(qpcR:::rbind.na, cashflow), na.rm = TRUE)))

# A tibble: 2 x 2
  IssueID val      
  <chr>   <list>   
1 AA      <dbl [6]>
2 AB      <dbl [6]>

val:

[[1]]
[1] -2000    75    75  1075    25  1025

[[2]]
[1] -3000   175   175  2175    75  1075