I have data on groups of bonds and their cash flows. Each IssueID contains multiple BondIDs. Each BondID has its own cash flow, rendered as a list within the data frame via dplyr mutate. The cash flow lists do not have equal numbers of elements. The data structure is:
IssueID | BondID | cashflow |
---|---|---|
AA | AA1 | c(-1000, 50, 50, 1050) |
AA | AA2 | c(-1000, 25, 25, 25, 25, 1025) |
AB | AB1 | c(-2000, 100, 100, 2100) |
AB | AB1 | c(-1000, 75, 75, 75, 75, 1075) |
I need to sum each BondID's cashflows by IssueID while maintaining each element's position in the list. The output needs to look like:
IssueID | sumcashflow |
---|---|
AA | c(-2000, 75, 75, 1075, 25, 1025) |
AB | c(-3000, 175, 175, 2175, 75, 1075) |
Appreciate any assistance. Thanks.
Am not able to sum the lists.
CodePudding user response:
Here is one option - grouped by 'IssueID', convert the cashflow
list to a named list
with sequence (row_number()
), then convert to a tibble using enframe
, unnest
the list
column, create a grouping by the rowid
of the 'name' (in case there are unequal lengths), and get the sum
of 'value', pull
the column as a list
library(dplyr)
library(tibble)
library(tidyr)
library(data.table)
out <- df1 %>%
group_by(IssueID) %>%
summarise(sumcashflow = setNames(cashflow, row_number()) %>%
enframe %>%
unnest(value) %>%
group_by(grp = rowid(name)) %>%
summarise(value = sum(value, na.rm = TRUE)) %>%
pull(value) %>%
list(.))
-output
> out$sumcashflow
[[1]]
[1] -2000 75 75 1075 25 1025
[[2]]
[1] -3000 175 175 2175 75 1075
> out
# A tibble: 2 × 2
IssueID sumcashflow
<chr> <list>
1 AA <dbl [6]>
2 AB <dbl [6]>
or using base R
with split
lst1 <- lapply(split(df1$cashflow, df1$IssueID), \(x) {
mx <- max(lengths(x))
rowSums(sapply(x, `length<-`, mx), na.rm = TRUE)
})
> lst1
$AA
[1] -2000 75 75 1075 25 1025
$AB
[1] -3000 175 175 2175 75 1075
data
df1 <- structure(list(IssueID = c("AA", "AA", "AB", "AB"), BondID = c("AA1",
"AA2", "AB1", "AB1"), cashflow = list(c(-1000, 50, 50, 1050),
c(-1000, 25, 25, 25, 25, 1025), c(-2000, 100, 100, 2100),
c(-1000, 75, 75, 75, 75, 1075))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
df1 %>%
group_by(IssueID) %>%
summarise(val = list(colSums(do.call(qpcR:::rbind.na, cashflow), na.rm = TRUE)))
# A tibble: 2 x 2
IssueID val
<chr> <list>
1 AA <dbl [6]>
2 AB <dbl [6]>
val:
[[1]]
[1] -2000 75 75 1075 25 1025
[[2]]
[1] -3000 175 175 2175 75 1075