I am trying to compute the TPVI [(current value dividends)/invested] over time of several companies in a dataframe.
My dataframe looks like this (dates in this example are not exact)
Company | Date | Metric | Value |
---|---|---|---|
Company1 | 11-15-2015 | invested | 100 |
Company1 | 11-15-2015 | current value | 120 |
Company1 | 11-15-2015 | dividends | 5 |
Company2 | 11-15-2015 | invested | 200 |
Company2 | 11-15-2015 | current value | 120 |
Company2 | 11-15-2015 | dividends | 30 |
... | ... | ... | ... |
I am trying to sort the values into a new dataframe that will look like this:
Company | Date | TVPI |
---|---|---|
Company1 | ... | ... |
Company1 | ... | ... |
Company1 | ... | ... |
Company1 | ... | ... |
Company2 | ... | ... |
Company2 | ... | ... |
Company2 | ... | ... |
i.e. a new TVPI row for each unique company/date pairing.
I have tried
df %>% group_by(Date, Metric) %>% summarize()
but I am not sure how to identify the proper values to summarize (in the example dataframe, the first tpvi would be (120 5)/100. Since each company/date pairing has three "values" to keep track of, the syntax of how to word this in a summarize
function eludes me. Any help appreciated.
CodePudding user response:
One way to easily perform calculations on the different Metric
s is to fist tidyr::pivot_wider()
so they're each a separate column and then use your formula to make a new column with dplyr::mutate()
. If you also supply .keep = "unused"
to mutate()
it will leave just the columns you indicated in your desired output. Also I convert your Date
column to an actual date
format instead of a character
which will make it behave better as an x
axis value in the plot. Then you can go directly into plotting. The example is pretty boring because you only supplied one Date
per Company
but this shows the general idea.
library(tidyverse)
d <- structure(list(Company = c("Company1", "Company1", "Company1", "Company2", "Company2", "Company2"), Date = c("11-15-2015", "11-15-2015", "11-15-2015", "11-15-2015", "11-15-2015", "11-15-2015"), Metric = c("invested", "current value", "dividends", "invested", "current value", "dividends"), Value = c(100L, 120L, 5L, 200L, 120L, 30L)), class = "data.frame", row.names = c(NA, -6L))
d %>%
pivot_wider(names_from = Metric, values_from = Value) %>%
mutate(TPVI = (`current value` dividends)/invested, .keep = "unused",
Date = as.Date(Date, format = "%m-%d-%Y")) %>%
ggplot(aes(Date, TPVI, color = Company))
geom_point()
geom_line()
Created on 2022-02-21 by the reprex package (v2.0.1)