I try to figure out how to align the rows and columns of many DataFrames
stored in two separate dictionaries
.
To illustrate it, I created four DataFrames
df1
, df2
, df3
, and df4
which I stored in dictionaries
:
df1:
Dates ID1 ID2 ID3 ID4 ID6
0 2021-01-01 0.0 0.0 0.1 0.0 0.0
1 2021-01-03 0.0 0.0 0.0 0.0 0.0
2 2021-01-04 0.1 0.1 0.0 0.1 0.1
3 2021-01-05 0.2 0.2 0.2 0.2 0.2
4 2021-01-06 0.1 0.1 0.4 0.1 0.1
df2:
Dates ID1 ID2 ID3 ID4 ID6
0 2021-01-01 0.00 0.0 0.1 0.0 0.0
1 2021-01-03 0.20 0.3 0.0 0.0 0.0
2 2021-01-04 0.15 -0.1 0.4 -0.1 -0.1
3 2021-01-05 0.20 -0.2 -0.2 0.3 0.2
4 2021-01-06 0.10 0.1 0.2 -0.1 0.1
df3:
Dates ID2 ID4 ID5 ID6
0 2021-01-01 3 1.0 4 2.0
1 2021-01-02 6 2.4 3 1.5
2 2021-01-03 3 -2.0 3 2.0
3 2021-01-04 -1 1.0 -3 -2.0
4 2021-01-05 3 4.0 4 3.0
df4:
Dates ID2 ID4 ID5 ID6
0 2021-01-02 6 2.4 3 1.5
1 2021-01-03 3 -2.0 3 2.0
2 2021-01-04 -1 1.0 -3 -2.0
3 2021-01-07 3 4.0 4 3.0
For reproducibility:
import pandas as pd
df1 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'ID1':[0,0,0.1,0.2,0.1],
'ID2':[0,0,0.1,0.2,0.1],
'ID3':[0.1,0,0,0.2,0.4],
'ID4':[0,0,0.1,0.2,0.1],
'ID6':[0,0,0.1,0.2,0.1]})
df2 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'ID1':[0,0.2,0.15,0.2,0.1],
'ID2':[0,0.3,-0.1,-0.2,0.1],
'ID3':[0.1,0,0.4,-0.2,0.2],
'ID4':[0,0,-0.1,0.3,-0.1],
'ID6':[0,0,-0.1,0.2,0.1]})
df3 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'ID2':[3,6,3,-1,3],
'ID4':[1,2.4,-2,1,4],
'ID5':[4,3,3,-3,4],
'ID6':[2,1.5,2,-2,3]})
df4 = pd.DataFrame({
'Dates':['2021-01-02', '2021-01-03', '2021-01-04', '2021-01-07'],
'ID2':[6,3,-1,3],
'ID4':[2.4,-2,1,4],
'ID5':[3,3,-3,4],
'ID6':[1.5,2,-2,3]})
dic1 = dict()
dic1['df1'] = df1
dic1['df2'] = df2
dic2 = dict()
dic2['df3'] = df3
dic2['df4'] = df4
What I try to get is:
dic1['df1']:
Dates ID2 ID4 ID6
0 2021-01-03 0.0 0.0 0.0
1 2021-01-04 0.1 0.1 0.1
dic1['df2']:
Dates ID2 ID4 ID6
0 2021-01-03 0.3 0.0 0.0
1 2021-01-04 -0.1 -0.1 -0.1
dic2['df3']:
Dates ID2 ID4 ID6
0 2021-01-03 3 -2 2
1 2021-01-04 -1 1 -2
dic2['df4']:
Dates ID2 ID4 ID6
0 2021-01-03 3 -2 2
1 2021-01-04 -1 1 -2
I try to align the rows and columns of the two dictionaries
so that I only include columns that are contained in all DataFrames
of the different dictionaries
. Is there an efficient command to deal with that in dictionaries
?
CodePudding user response:
Use DataFrame.align
with create DatetmeIndex
in both DataFrame
s:
dic1['df1'], dic2['df2'] = (dic1['df1'].set_index('Dates')
.align(dic2['df2'].set_index('Dates'),
fill_value=0,
join='inner'))
print (dic1['df1'])
ID2 ID4 ID6
Dates
2021-01-01 0.0 0.0 0.0
2021-01-03 0.0 0.0 0.0
2021-01-04 0.1 0.1 0.1
2021-01-05 0.2 0.2 0.2
print (dic2['df2'])
ID2 ID4 ID6
Dates
2021-01-01 3 1.0 2.0
2021-01-03 3 -2.0 2.0
2021-01-04 -1 1.0 -2.0
2021-01-05 3 4.0 3.0
a, b = (dic1['df1'].set_index('Dates')
.align(dic2['df2'].set_index('Dates'),
fill_value=0,
join='inner'))
dic1['df1'], dic2['df2'] = a.reset_index(), b.reset_index()
print (dic1['df1'])
Dates ID2 ID4 ID6
0 2021-01-01 0.0 0.0 0.0
1 2021-01-03 0.0 0.0 0.0
2 2021-01-04 0.1 0.1 0.1
3 2021-01-05 0.2 0.2 0.2
print (dic2['df2'])
Dates ID2 ID4 ID6
0 2021-01-01 3 1.0 2.0
1 2021-01-03 3 -2.0 2.0
2 2021-01-04 -1 1.0 -2.0
3 2021-01-05 3 4.0 3.0
EDIT: First are merged all dictionaries to one big and extract Date
s and columns names to nested lists, last filter values by DataFrame.reindex
:
idx, cols = [],[]
for k, v in {**dic1, **dic2}.items():
cols.append(v.drop('Dates', axis=1).columns)
idx.append(v['Dates'])
from functools import reduce
idx = reduce(lambda x, y: set(x) & set(y), idx)
print (idx)
{'2021-01-04', '2021-01-03'}
cols = reduce(lambda x, y: set(x) & set(y), cols)
print (cols)
{'ID4', 'ID2', 'ID6'}
for k, v in dic1.items():
dic1[k] = dic1[k].set_index('Dates').reindex(index=idx, columns=cols, fill_value=0).reset_index()
for k, v in dic2.items():
dic2[k] = dic2[k].set_index('Dates').reindex(index=idx, columns=cols, fill_value=0).reset_index()
CodePudding user response:
You can compute the intersection of df1 and df2 columns and dates and keep only those:
df1 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'ID1':[0,0,0.1,0.2,0.1],
'ID2':[0,0,0.1,0.2,0.1],
'ID3':[0.1,0,0,0.2,0.4],
'ID4':[0,0,0.1,0.2,0.1],
'ID6':[0,0,0.1,0.2,0.1]})
df2 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'ID2':[3,6,3,-1,3],
'ID4':[1,2.4,-2,1,4],
'ID5':[4,3,3,-3,4],
'ID6':[2,1.5,2,-2,3]})
cols = set(df1.columns).intersection(set(df2.columns))
dates = set(df1["Dates"]).intersection(set(df2["Dates"]))
dic1 = dict()
dic1['df1'] = df1[df1["Dates"].isin(dates)][cols].reset_index()
dic2 = dict()
dic2['df2'] = df2[df2["Dates"].isin(dates)][cols].reset_index()
print(dic1['df1'])
print(dic2['df2'])
Output:
index Dates ID6 ID2 ID4
0 0 2021-01-01 0.0 0.0 0.0
1 1 2021-01-03 0.0 0.0 0.0
2 2 2021-01-04 0.1 0.1 0.1
3 3 2021-01-05 0.2 0.2 0.2
index Dates ID6 ID2 ID4
0 0 2021-01-01 2.0 3 1.0
1 2 2021-01-03 2.0 3 -2.0
2 3 2021-01-04 -2.0 -1 1.0
3 4 2021-01-05 3.0 3 4.0