Home > Software engineering >  Merging dataframes with uneven time series data
Merging dataframes with uneven time series data

Time:09-26

I have two dataframes with time series data whose values are a list. I want to merge them into a single dataframe, resulting in the concatenation of the two lists. The tricky part is one dataframe contains more data points than the other. When I do a "sum" of the two dataframes, for the dates that do not overlap, the values come out as NaN, instead of keeping the original value. To provide example, df1 contains the following:

2021-09-16 [AAA,BBB]    
2021-09-17 [CCC]    
2021-09-18 [DDD]

df2 contains the following:

2021-09-16 [EEE]    
2021-09-17 [FFF]    
2021-09-18 []
2021-09-19 [GGG]
2021-09-20 [HHH,III]

When I do df1 df2, I get:

2021-09-16 [AAA,BBB,EEE]    
2021-09-17 [CCC,FFF]    
2021-09-18 [DDD]
2021-09-19 NaN
2021-09-20 NaN

Instead, I want 2021-09-19 and 2021-09-20 to show [GGG] and [HHH,III], respectively. What's the solution?

CodePudding user response:

Setup

>>> df1
                   col
2021-09-16  [AAA, BBB]
2021-09-17       [CCC]
2021-09-18       [DDD]

>>> df2
                   col
2021-09-16       [EEE]
2021-09-17       [FFF]
2021-09-18          []
2021-09-19       [GGG]
2021-09-20  [HHH, III]

Solution: DataFrame.align

Align the two dataframes along the index axis and fill the values of missing index with empty list [], then we can easily add the two dataframes using df1 df2

d1, d2 = df1.align(df2, fill_value=[]))
print(d1   d2)

                        col
2021-09-16  [AAA, BBB, EEE]
2021-09-17       [CCC, FFF]
2021-09-18          [DDD, ]
2021-09-19            [GGG]
2021-09-20       [HHH, III]
  • Related