Home > Software engineering >  How to join two time series data frame so the resultant data frame has the all the unique dates and
How to join two time series data frame so the resultant data frame has the all the unique dates and

Time:12-11

I have two time series data frame:

df1 = pd.DataFrame({'Date': [pd.to_datetime('1980-01-03'), pd.to_datetime('1980-01-04'),
                             pd.to_datetime('1980-01-05'), pd.to_datetime('1980-01-06'),
                             pd.to_datetime('1980-01-07'), pd.to_datetime('1980-01-8')],
                    'Temp': [13.5,10,14,12,10,9]})
df1


    Date    Temp
0   1980-01-03  13.5
1   1980-01-04  10.0
2   1980-01-05  14.0
3   1980-01-06  12.0
4   1980-01-07  10.0
5   1980-01-08  9.0

and

df2 = pd.DataFrame({'Date': [pd.to_datetime('1980-01-01'), pd.to_datetime('1980-01-02'),
                             pd.to_datetime('1980-01-03'), pd.to_datetime('1980-01-04')], 
                    'Temp': [10,17,13.5,10]})
df2
        Date    Temp
0   1980-01-01  10.0
1   1980-01-02  17.0
2   1980-01-03  13.5
3   1980-01-04  10.0

Now my task is to join these data frames based on Dates such that the resultant data frame has the dates that are unique to both data frames and also has single entry for common (present in both data frames) dates and are arranged in proper date sequence.

To that effect I tried the following:

df = pd.concat([df1, df2])
df.reset_index().drop(columns = ['index'], axis = 1)
            Date    Temp
0   1980-01-03  13.5
1   1980-01-04  10.0
2   1980-01-05  14.0
3   1980-01-06  12.0
4   1980-01-07  10.0
5   1980-01-08  9.0
6   1980-01-01  10.0
7   1980-01-02  17.0
8   1980-01-03  13.5
9   1980-01-04  

But this is incorrect result. What I am trying to get is:

    Date    Temp
0   1980-01-01  10.0
1   1980-01-02  17.0
2   1980-01-03  13.5
3   1980-01-04  10.0
4   1980-01-05  14.0
5   1980-01-06  12.0
6   1980-01-07  10.0
7   1980-01-08  9.0

What can I do? May be the pd.concat() is not the way to go?

CodePudding user response:

A possible solution:

pd.merge(df1, df2, how="outer").sort_values(by="Date").reset_index(drop=True)

Output:

        Date  Temp
0 1980-01-01  10.0
1 1980-01-02  17.0
2 1980-01-03  13.5
3 1980-01-04  10.0
4 1980-01-05  14.0
5 1980-01-06  12.0
6 1980-01-07  10.0
7 1980-01-08   9.0
  • Related