Home > Software engineering >  Align the rows and columns of DataFrames between two Dictionaries
Align the rows and columns of DataFrames between two Dictionaries

Time:12-15

I try to figure out how to align the rows and columns of many DataFrames stored in two separate dictionaries.

To illustrate it, I created four DataFrames df1, df2, df3, and df4 which I stored in dictionaries:

df1:
    Dates       ID1  ID2  ID3   ID4  ID6
0   2021-01-01  0.0  0.0  0.1   0.0  0.0
1   2021-01-03  0.0  0.0  0.0   0.0  0.0
2   2021-01-04  0.1  0.1  0.0   0.1  0.1
3   2021-01-05  0.2  0.2  0.2   0.2  0.2
4   2021-01-06  0.1  0.1  0.4   0.1  0.1

df2:
    Dates       ID1    ID2   ID3   ID4   ID6
0   2021-01-01  0.00   0.0   0.1   0.0   0.0
1   2021-01-03  0.20   0.3   0.0   0.0   0.0
2   2021-01-04  0.15   -0.1  0.4   -0.1  -0.1
3   2021-01-05  0.20   -0.2  -0.2   0.3  0.2
4   2021-01-06  0.10   0.1   0.2   -0.1  0.1

df3:
    Dates       ID2   ID4    ID5   ID6
0   2021-01-01  3     1.0    4     2.0
1   2021-01-02  6     2.4    3     1.5
2   2021-01-03  3     -2.0   3     2.0
3   2021-01-04  -1    1.0    -3    -2.0
4   2021-01-05  3     4.0    4     3.0

df4:
    Dates       ID2   ID4   ID5    ID6
0   2021-01-02  6     2.4   3     1.5
1   2021-01-03  3     -2.0  3     2.0
2   2021-01-04  -1    1.0   -3    -2.0
3   2021-01-07  3     4.0   4     3.0

For reproducibility:

import pandas as pd
df1 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
    'ID1':[0,0,0.1,0.2,0.1],
    'ID2':[0,0,0.1,0.2,0.1],
    'ID3':[0.1,0,0,0.2,0.4],
    'ID4':[0,0,0.1,0.2,0.1], 
    'ID6':[0,0,0.1,0.2,0.1]})

df2 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
    'ID1':[0,0.2,0.15,0.2,0.1],
    'ID2':[0,0.3,-0.1,-0.2,0.1],
    'ID3':[0.1,0,0.4,-0.2,0.2],
    'ID4':[0,0,-0.1,0.3,-0.1], 
    'ID6':[0,0,-0.1,0.2,0.1]})

df3 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'], 
    'ID2':[3,6,3,-1,3],
    'ID4':[1,2.4,-2,1,4],
    'ID5':[4,3,3,-3,4],
    'ID6':[2,1.5,2,-2,3]})

df4 = pd.DataFrame({
    'Dates':['2021-01-02', '2021-01-03', '2021-01-04', '2021-01-07'], 
    'ID2':[6,3,-1,3],
    'ID4':[2.4,-2,1,4],
    'ID5':[3,3,-3,4],
    'ID6':[1.5,2,-2,3]})

dic1 = dict()
dic1['df1'] = df1
dic1['df2'] = df2

dic2 = dict()
dic2['df3'] = df3
dic2['df4'] = df4

What I try to get is:

dic1['df1']:
    Dates       ID2  ID4    ID6
0   2021-01-03  0.0  0.0    0.0
1   2021-01-04  0.1  0.1    0.1

dic1['df2']:
    Dates       ID2   ID4   ID6
0   2021-01-03  0.3   0.0   0.0
1   2021-01-04  -0.1  -0.1  -0.1

dic2['df3']:
    Dates       ID2  ID4   ID6
0   2021-01-03  3    -2    2
1   2021-01-04  -1   1     -2

dic2['df4']:
    Dates       ID2  ID4   ID6
0   2021-01-03  3    -2    2
1   2021-01-04  -1   1     -2

I try to align the rows and columns of the two dictionaries so that I only include columns that are contained in all DataFrames of the different dictionaries. Is there an efficient command to deal with that in dictionaries?

CodePudding user response:

Use DataFrame.align with create DatetmeIndex in both DataFrames:

dic1['df1'], dic2['df2'] = (dic1['df1'].set_index('Dates')
                              .align(dic2['df2'].set_index('Dates'),
                                     fill_value=0, 
                                     join='inner'))
print (dic1['df1'])
            ID2  ID4  ID6
Dates                    
2021-01-01  0.0  0.0  0.0
2021-01-03  0.0  0.0  0.0
2021-01-04  0.1  0.1  0.1
2021-01-05  0.2  0.2  0.2

print (dic2['df2'])
            ID2  ID4  ID6
Dates                    
2021-01-01    3  1.0  2.0
2021-01-03    3 -2.0  2.0
2021-01-04   -1  1.0 -2.0
2021-01-05    3  4.0  3.0

a, b = (dic1['df1'].set_index('Dates')
               .align(dic2['df2'].set_index('Dates'), 
                      fill_value=0,
                      join='inner'))

dic1['df1'], dic2['df2'] = a.reset_index(), b.reset_index()
print (dic1['df1'])
        Dates  ID2  ID4  ID6
0  2021-01-01  0.0  0.0  0.0
1  2021-01-03  0.0  0.0  0.0
2  2021-01-04  0.1  0.1  0.1
3  2021-01-05  0.2  0.2  0.2

print (dic2['df2'])
        Dates  ID2  ID4  ID6
0  2021-01-01    3  1.0  2.0
1  2021-01-03    3 -2.0  2.0
2  2021-01-04   -1  1.0 -2.0
3  2021-01-05    3  4.0  3.0

EDIT: First are merged all dictionaries to one big and extract Dates and columns names to nested lists, last filter values by DataFrame.reindex:

idx, cols = [],[]
for k, v in {**dic1, **dic2}.items():
    cols.append(v.drop('Dates', axis=1).columns)
    idx.append(v['Dates'])
    
from functools import reduce
idx = reduce(lambda x, y: set(x) & set(y), idx)
print (idx)
{'2021-01-04', '2021-01-03'}

cols = reduce(lambda x, y: set(x) & set(y), cols)
print (cols)
{'ID4', 'ID2', 'ID6'}

for k, v in dic1.items():
    dic1[k] = dic1[k].set_index('Dates').reindex(index=idx, columns=cols, fill_value=0).reset_index()

for k, v in dic2.items():
    dic2[k] = dic2[k].set_index('Dates').reindex(index=idx, columns=cols, fill_value=0).reset_index()

CodePudding user response:

You can compute the intersection of df1 and df2 columns and dates and keep only those:

df1 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
    'ID1':[0,0,0.1,0.2,0.1],
    'ID2':[0,0,0.1,0.2,0.1],
    'ID3':[0.1,0,0,0.2,0.4],
    'ID4':[0,0,0.1,0.2,0.1], 
    'ID6':[0,0,0.1,0.2,0.1]})

df2 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'], 
    'ID2':[3,6,3,-1,3],
    'ID4':[1,2.4,-2,1,4],
    'ID5':[4,3,3,-3,4],
    'ID6':[2,1.5,2,-2,3]})

cols = set(df1.columns).intersection(set(df2.columns))
dates = set(df1["Dates"]).intersection(set(df2["Dates"]))

dic1 = dict()
dic1['df1'] = df1[df1["Dates"].isin(dates)][cols].reset_index()

dic2 = dict()
dic2['df2'] = df2[df2["Dates"].isin(dates)][cols].reset_index()

print(dic1['df1'])
print(dic2['df2'])

Output:

   index       Dates  ID6  ID2  ID4
0      0  2021-01-01  0.0  0.0  0.0
1      1  2021-01-03  0.0  0.0  0.0
2      2  2021-01-04  0.1  0.1  0.1
3      3  2021-01-05  0.2  0.2  0.2
   index       Dates  ID6  ID2  ID4
0      0  2021-01-01  2.0    3  1.0
1      2  2021-01-03  2.0    3 -2.0
2      3  2021-01-04 -2.0   -1  1.0
3      4  2021-01-05  3.0    3  4.0
  • Related