Home > Net >  Combining two dataframes of dates and values
Combining two dataframes of dates and values

Time:05-24

I have two dataframes loaded from a .csv file. One contains numeric values, the other dates (month-year) for when these numeric values occured. The dates and values are basically mapped to each other. I would like to combine/merge these dataframes to have the dates as the column, and values as the rows. However, as you can see, the dates, though ordered from left to right, they don't all start on the same month.

import pandas as pd

df1 = pd.DataFrame(
    [
        [1, 2, pd.NA, pd.NA, pd.NA],
        [2, 3, 4, pd.NA, pd.NA],
        [4, 5, 6, pd.NA, pd.NA],
        [5, 6, 12, 14, 15]
    ]
)

df2 = pd.DataFrame(
    [
        ["2021-01", "2021-02", pd.NA, pd.NA, pd.NA],
        ["2021-02", "2021-03", "2021-04", pd.NA, pd.NA],
        ["2022-03", "2022-04", "2022-05", pd.NA, pd.NA],
        ["2021-04", "2021-05", "2021-06", "2021-07", "2021-08"]
    ]
)

df1

enter image description here

df2

enter image description here

Although I managed to create the combined dataframe, the dataframes, df1 and df2 contain ~300k rows, and the approach I thought of is rather slow. Is there a more efficient way of achieving the same result?

q = {z: {x: y for x, y in zip(df2.values[z], df1.values[z]) if not pd.isna(y)} for z in range(len(df2))}

df = pd.DataFrame.from_dict(q, orient='index')

idx = pd.to_datetime(df.columns, errors='coerce', format='%Y-%m').argsort()

df.iloc[:, idx]

df3 (result)

enter image description here

CodePudding user response:

You can stack, concat and pivot:

(pd.concat([df1.stack(), df2.stack()], axis=1)
   .reset_index(level=0)
   .pivot(index='level_0', columns=1, values=0)
   .rename_axis(index=None, columns=None)
)

Alternative with unstack:

(pd.concat([df1.stack(), df2.stack()], axis=1)
   .droplevel(1).set_index(1, append=True)
   [0].unstack(1)
   .rename_axis(columns=None)
)

output:

  2021-01 2021-02 2021-03 2021-04 2021-05 2021-06 2021-07 2021-08 2022-03 2022-04 2022-05
0       1       2     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
1     NaN       2       3       4     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN       4       5       6
3     NaN     NaN     NaN       5       6      12      14      15     NaN     NaN     NaN

CodePudding user response:

Use concat with keys parameters, so possible after DataFrame.stack and convert MutiIndex to column use DataFrame.pivot:

df = (pd.concat([df1, df2], axis=1, keys=['a','b'])
        .stack()
        .reset_index()
        .pivot('level_0','b','a'))
print (df)
b       2021-01 2021-02 2021-03 2021-04 2021-05 2021-06 2021-07 2021-08  \
level_0                                                                   
0             1       2     NaN     NaN     NaN     NaN     NaN     NaN   
1           NaN       2       3       4     NaN     NaN     NaN     NaN   
2           NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN   
3           NaN     NaN     NaN       5       6      12      14      15   

b       2022-03 2022-04 2022-05  
level_0                          
0           NaN     NaN     NaN  
1           NaN     NaN     NaN  
2             4       5       6  
3           NaN     NaN     NaN  
  • Related