I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset return index corresponding to that date.
DATE | Asset_1 | Asset_2 | Asset_3 | Asset_4 |
---|---|---|---|---|
1/1/2000 | 1000 | 300 | 2900 | NA |
2/1/2000 | 1100 | 350 | 2950 | NA |
3/1/2000 | 1200 | 330 | 2970 | 100 |
4/1/2000 | 1200 | 360 | 3000 | 200 |
5/1/2000 | 1200 | 370 | 3500 | 300 |
My problem is that delisted assets, such as Asset_1 in my example, remain in my dataset after the delisting, distorting my further processing. I have another dataframe where the delisting dates of the assets are listed. For example:
ASSET_CODE | DELIST_DATE |
---|---|
Asset_1 | 3/1/2000 |
Asset_2 | NA |
Asset_3 | NA |
Asset_4 | NA |
What i want to achieve, is to remove all observations from delisted assets after the delisting date, i.e. if DATE>DELIST_DATE(ASSET_CODE). This would be quite straightforward in Excel using a combination of IF and indexmatching, but can't figure how to do this in R.
My desired output would be:
DATE | Asset_1 | Asset_2 | Asset_3 | Asset_4 |
---|---|---|---|---|
1/1/2000 | 1000 | 300 | 2900 | NA |
2/1/2000 | 1100 | 350 | 2950 | NA |
3/1/2000 | 1200 | 330 | 2970 | 100 |
4/1/2000 | NA | 360 | 300 | 200 |
5/1/2000 | NA | 370 | 350 | 300 |
CodePudding user response:
You can do this with a join filter. But you have two add'l steps that need to be taken care of.
convert the dates to a
Date
data format so they can be compared chronologically. While Excel famously likes to make everything into a date, R needs to be told that your character string represents a date and shouldn't be sorted "alphabetically."For a join, it's easier if your data is in long format.
4 of the seven steps below relate to the dates and reshaping.
library(dplyr)
asset_returns %>%
mutate(DATE = as.Date(DATE, "%m/%d/%Y")) %>% # DATES
tidyr::pivot_longer(-DATE, names_to = "ASSET_CODE") %>% # RESHAPE
left_join(delistings) %>%
mutate(DELIST_DATE = as.Date(DELIST_DATE, "%m/%d/%Y")) %>% # DATES
filter(DATE <= DELIST_DATE | is.na(DELIST_DATE)) %>%
select(-DELIST_DATE) %>%
tidyr::pivot_wider(names_from = ASSET_CODE, values_from = value) # RESHAPE
Result
Joining, by = "ASSET_CODE"
# A tibble: 5 × 5
DATE Asset_1 Asset_2 Asset_3 Asset_4
<date> <int> <int> <int> <int>
1 2000-01-01 1000 300 2900 NA
2 2000-02-01 1100 350 2950 NA
3 2000-03-01 1200 330 2970 100
4 2000-04-01 NA 360 3000 200
5 2000-05-01 NA 370 3500 300
Reproducible data
asset_returns <- data.frame(
stringsAsFactors = FALSE,
DATE = c("1/1/2000","2/1/2000",
"3/1/2000","4/1/2000","5/1/2000"),
Asset_1 = c(1000L, 1100L, 1200L, 1200L, 1200L),
Asset_2 = c(300L, 350L, 330L, 360L, 370L),
Asset_3 = c(2900L, 2950L, 2970L, 3000L, 3500L),
Asset_4 = c(NA, NA, 100L, 200L, 300L)
)
delistings <- data.frame(
stringsAsFactors = FALSE,
ASSET_CODE = c("Asset_1", "Asset_2", "Asset_3", "Asset_4"),
DELIST_DATE = c("3/1/2000", NA, NA, NA)
)