I have a dataframe with hundreds of different investments (represented by the "id" column), their cashflows, and market value. The following example demonstrates the data that I'm working with:
df <- data.frame(date = c("2020-01-31", "2020-02-29", "2020-03-31", "2020-02-29", "2020-03-31", "2020-04-30", "2020-05-31"),
id = c("alpha", "alpha", "alpha", "bravo", "bravo", "bravo", "bravo"),
cashflow = c(-100,20,4,-50,8,12,8),
market_value = c(100,90,80,50,110,120,115))
I ultimately want to calculate the IRR per investment. However, before I can do that, I need to add only the last market value number to the corresponding cashflow. I don't care about any market values before that. In this case, the last cashflow for "alpha" investment must be 84 (i.e., 80 market value 4 cashflow) and the last cashflow for "bravo" investment must be 123 (i.e., 115 market value 8 cashflow).
Desired output:
id | cashflow |
---|---|
alpha | -100 |
alpha | 20 |
alpha | 84 |
bravo | -50 |
bravo | 8 |
bravo | 12 |
bravo | 123 |
Thanks!
CodePudding user response:
I'm not too sure on what final output you want but here's how you'd just take the last.
df %>%
mutate(total = cashflow market_value) %>%
group_by(id) %>%
slice_max(order_by = date) %>%
ungroup()
#> # A tibble: 2 × 5
#> date id cashflow market_value total
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 2020-03-31 alpha 4 80 84
#> 2 2020-05-31 bravo 8 115 123
Created on 2022-07-22 by the reprex package (v2.0.1)
EDIT - just seen what I think is your desired output, how's this?
df %>%
group_by(id) %>%
mutate(
cashflow = if_else(row_number() == n(), cashflow market_value, cashflow)
)
#> # A tibble: 7 × 4
#> # Groups: id [2]
#> date id cashflow market_value
#> <chr> <chr> <dbl> <dbl>
#> 1 2020-01-31 alpha -100 100
#> 2 2020-02-29 alpha 20 90
#> 3 2020-03-31 alpha 84 80
#> 4 2020-02-29 bravo -50 50
#> 5 2020-03-31 bravo 8 110
#> 6 2020-04-30 bravo 12 120
#> 7 2020-05-31 bravo 123 115
Created on 2022-07-22 by the reprex package (v2.0.1)