I have a a time series data set with variables S&P 500 stock indice core CRB. Although most days match some don't much since in one of the variables the observations are not available. Therefore, the observations differ. For example;
Date | S&P 500 | Date | CRB |
---|---|---|---|
27.05.2022 | 7 | 27.05.2022 | 0.89 |
26.05.2022 | 1 | 26.05.2022 | 0.54 |
24.05.2022 | 2 | 25.05.2022 | 0.73 |
23.05.2022 | 5 | 24.05.2022 | 0.75 |
21.05.2022 | 7 | 23.05.2022 | 0.63 |
20.05.2022 | 9 | 21.05.2022 | 0.64 |
So my question is, is there a way in R, where the days match and the dates that don't have values for the other variable will be implemented to have a result for example;
Date | S&P 500 | Date | CRB |
---|---|---|---|
27.05.2022 | 7 | 27.05.2022 | 0.89 |
26.05.2022 | 1 | 26.05.2022 | 0.54 |
24.05.2022 | 2 | 24.05.2022 | 0.75 |
23.05.2022 | 5 | 23.05.2022 | 0.63 |
21.05.2022 | 7 | 21.05.2022 | 0.64 |
Ofcourse I will lose observations but that is not a problem.
Would mean a lot if someone knows a code on how to do that. thank you!
CodePudding user response:
To exclude all values in the second date column and get your desired output, you can use setdiff()
:
Data:
df <- read.table(text = "Date1 SP500 Date2 CRB
27.05.2022 7 27.05.2022 0.89
26.05.2022 1 26.05.2022 0.54
24.05.2022 2 25.05.2022 0.73
23.05.2022 5 24.05.2022 0.75
21.05.2022 7 23.05.2022 0.63
20.05.2022 9 21.05.2022 0.64", header = TRUE)
df[!(df$Date1 %in% setdiff(df$Date1, df$Date2)),]
output:
# Date1 SP500 Date2 CRB
# 1 27.05.2022 7 27.05.2022 0.89
# 2 26.05.2022 1 26.05.2022 0.54
# 3 24.05.2022 2 25.05.2022 0.73
# 4 23.05.2022 5 24.05.2022 0.75
# 5 21.05.2022 7 23.05.2022 0.63
Here, setdiff(df$Date1, df$Date2)
finds all values that are in Date1
but not Date2
. Then we index the rows in df
(i.e., df[,]
by finding the rows that contain only dates in Date1
CodePudding user response:
To do this while maintaining the order, make use of dplyr::inner_join
inner_join(df[1:2], df[3:4], 'Date') %>%
mutate(Date1 = Date, .before = 'CRB')
Date S&P 500 Date1 CRB
1 27.05.2022 7 27.05.2022 0.89
2 26.05.2022 1 26.05.2022 0.54
3 24.05.2022 2 24.05.2022 0.75
4 23.05.2022 5 23.05.2022 0.63
5 21.05.2022 7 21.05.2022 0.64