I have a data frame like the following but with more columns (444 ticker) and 12 months.
date <- as.Date(c(„2020-01-01“, „2020-02-01“, „2020-03-01“))
ticker1 <- c(0.01, 0.02, 0.03)
ticker2 <- c(0.015, 0.025, 0.035)
mydata <- data.frame(date, ticker1, ticker2)
I would like to expand the monthly data to daily and found some neat code for that here:
mydata %>% mutate(date = ymd(date)) %>%
group_by(date) %>%
expand(date = seq(floor_date(date, unit = "month"),
ceiling_date(date, unit="month")-days(1), by="day"), **ticker1**) %>%
as.data.frame()
The floor/ceiling date approach works only for specifically addressed columns (eg by column name or via $), however, it won’t let me use a vector or similar and it seems really inefficient to type in 444 columns names manually.
If I use a vector, the dates are messed up since there appear multiple values for the same date.
What happens: |date|ticker1|ticker2|...ticker n| |:--|:--:|:--:|--:| |2020-01-01|0.01|0.015|0.xxx| |2020-01-01|0.02|0.025|0.xxx| |2020-01-01|0.03|0.035|0.xxx|
What I want: |date|ticker1|ticker2|...ticker n| |:--|:--:|:--:|--:| |2020-01-01|0.01|0.015|0.xxx| |2020-01-02|0.01|0.015|0.xxx| |2020-01-03|0.01|0.015|0.xxx|
I don’t know how to correctly address all columns of the data frame such that the intended structure of the data frame is maintained.
Thanks a ton!
CodePudding user response:
You can do this with an extra step and join the data together and then use fill
to get what you want. So first create the range of dates. Join these together with the original dataset, sort and fill the NA's with the data from the first of the month.
library(dplyr)
library(tidyr)
library(lubridate)
date <- ymd(c("2020-01-01", "2020-02-01", "2020-03-01"))
ticker1 <- c(0.01, 0.02, 0.03)
ticker2 <- c(0.015, 0.025, 0.035)
df1 <- data.frame(date, ticker1, ticker2)
all_dates <- df1 %>%
group_by(date) %>%
expand(date = seq(floor_date(date, unit = "month"),
ceiling_date(date, unit="month")-days(1), by="day"))
out <- df1 %>%
right_join(all_dates) %>%
arrange(date) %>%
fill(starts_with("ticker"))
head(out)
date ticker1 ticker2
1 2020-01-01 0.01 0.015
2 2020-01-02 0.01 0.015
3 2020-01-03 0.01 0.015
4 2020-01-04 0.01 0.015
5 2020-01-05 0.01 0.015
6 2020-01-06 0.01 0.015
EDIT: In case of ticker symbols, you can use:
out <- df1 %>%
right_join(all_dates) %>%
arrange(date) %>%
fill(everything())
P.S. don't call a data.frame df. df
is a base R function for the F distribution. Using df as a variable name might lead to unexpected errors in your code.