Home > Software design >  Is there a short cut in R to merge two time series variables based on a common date observation?
Is there a short cut in R to merge two time series variables based on a common date observation?

Time:05-28

I have a a time series data set with variables S&P 500 stock indice core CRB. Although most days match some don't much since in one of the variables the observations are not available. Therefore, the observations differ. For example;

Date S&P 500 Date CRB
27.05.2022 7 27.05.2022 0.89
26.05.2022 1 26.05.2022 0.54
24.05.2022 2 25.05.2022 0.73
23.05.2022 5 24.05.2022 0.75
21.05.2022 7 23.05.2022 0.63
20.05.2022 9 21.05.2022 0.64

So my question is, is there a way in R, where the days match and the dates that don't have values for the other variable will be implemented to have a result for example;

Date S&P 500 Date CRB
27.05.2022 7 27.05.2022 0.89
26.05.2022 1 26.05.2022 0.54
24.05.2022 2 24.05.2022 0.75
23.05.2022 5 23.05.2022 0.63
21.05.2022 7 21.05.2022 0.64

Ofcourse I will lose observations but that is not a problem.

Would mean a lot if someone knows a code on how to do that. thank you!

CodePudding user response:

To exclude all values in the second date column and get your desired output, you can use setdiff():

Data:

df <- read.table(text = "Date1  SP500   Date2   CRB
27.05.2022  7   27.05.2022  0.89
26.05.2022  1   26.05.2022  0.54
24.05.2022  2   25.05.2022  0.73
23.05.2022  5   24.05.2022  0.75
21.05.2022  7   23.05.2022  0.63
20.05.2022  9   21.05.2022  0.64", header = TRUE)
df[!(df$Date1 %in% setdiff(df$Date1, df$Date2)),]

output:

# Date1 SP500      Date2  CRB
# 1 27.05.2022     7 27.05.2022 0.89
# 2 26.05.2022     1 26.05.2022 0.54
# 3 24.05.2022     2 25.05.2022 0.73
# 4 23.05.2022     5 24.05.2022 0.75
# 5 21.05.2022     7 23.05.2022 0.63

Here, setdiff(df$Date1, df$Date2) finds all values that are in Date1 but not Date2. Then we index the rows in df (i.e., df[,] by finding the rows that contain only dates in Date1

CodePudding user response:

To do this while maintaining the order, make use of dplyr::inner_join

inner_join(df[1:2], df[3:4], 'Date') %>%
   mutate(Date1 = Date, .before = 'CRB')

         Date   S&P 500        Date1      CRB
1  27.05.2022    7        27.05.2022    0.89  
2  26.05.2022    1        26.05.2022    0.54  
3  24.05.2022    2        24.05.2022    0.75  
4  23.05.2022    5        23.05.2022    0.63  
5  21.05.2022    7        21.05.2022    0.64
  •  Tags:  
  • r
  • Related