Home > Back-end >  Removing values depending on lookup condition
Removing values depending on lookup condition

Time:02-04

I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset return index corresponding to that date.

DATE Asset_1 Asset_2 Asset_3 Asset_4
1/1/2000 1000 300 2900 NA
2/1/2000 1100 350 2950 NA
3/1/2000 1200 330 2970 100
4/1/2000 1200 360 3000 200
5/1/2000 1200 370 3500 300

My problem is that delisted assets, such as Asset_1 in my example, remain in my dataset after the delisting, distorting my further processing. I have another dataframe where the delisting dates of the assets are listed. For example:

ASSET_CODE DELIST_DATE
Asset_1 3/1/2000
Asset_2 NA
Asset_3 NA
Asset_4 NA

What i want to achieve, is to remove all observations from delisted assets after the delisting date, i.e. if DATE>DELIST_DATE(ASSET_CODE). This would be quite straightforward in Excel using a combination of IF and indexmatching, but can't figure how to do this in R.

My desired output would be:

DATE Asset_1 Asset_2 Asset_3 Asset_4
1/1/2000 1000 300 2900 NA
2/1/2000 1100 350 2950 NA
3/1/2000 1200 330 2970 100
4/1/2000 NA 360 300 200
5/1/2000 NA 370 350 300

CodePudding user response:

You can do this with a join filter. But you have two add'l steps that need to be taken care of.

  1. convert the dates to a Date data format so they can be compared chronologically. While Excel famously likes to make everything into a date, R needs to be told that your character string represents a date and shouldn't be sorted "alphabetically."

  2. For a join, it's easier if your data is in long format.


4 of the seven steps below relate to the dates and reshaping.

library(dplyr)
asset_returns %>%
  mutate(DATE = as.Date(DATE, "%m/%d/%Y")) %>%                     # DATES
  tidyr::pivot_longer(-DATE, names_to = "ASSET_CODE") %>%          # RESHAPE
  left_join(delistings) %>%
  mutate(DELIST_DATE = as.Date(DELIST_DATE, "%m/%d/%Y")) %>%       # DATES
  filter(DATE <= DELIST_DATE | is.na(DELIST_DATE)) %>%
  select(-DELIST_DATE) %>%
  tidyr::pivot_wider(names_from = ASSET_CODE, values_from = value) # RESHAPE

Result

Joining, by = "ASSET_CODE"
# A tibble: 5 × 5
  DATE       Asset_1 Asset_2 Asset_3 Asset_4
  <date>       <int>   <int>   <int>   <int>
1 2000-01-01    1000     300    2900      NA
2 2000-02-01    1100     350    2950      NA
3 2000-03-01    1200     330    2970     100
4 2000-04-01      NA     360    3000     200
5 2000-05-01      NA     370    3500     300

Reproducible data

asset_returns <- data.frame(
  stringsAsFactors = FALSE,
  DATE = c("1/1/2000","2/1/2000",
           "3/1/2000","4/1/2000","5/1/2000"),
  Asset_1 = c(1000L, 1100L, 1200L, 1200L, 1200L),
  Asset_2 = c(300L, 350L, 330L, 360L, 370L),
  Asset_3 = c(2900L, 2950L, 2970L, 3000L, 3500L),
  Asset_4 = c(NA, NA, 100L, 200L, 300L)
)

delistings <- data.frame(
  stringsAsFactors = FALSE,
        ASSET_CODE = c("Asset_1", "Asset_2", "Asset_3", "Asset_4"),
       DELIST_DATE = c("3/1/2000", NA, NA, NA)
              )
  • Related