I have a list of dataframes which have 1 column in common ('label'). However, in some of the dataframes some rows are missing.
Example: df1 = pd.DataFrame([['sample1',2,3], ['sample4',7,8]], columns=['label', 'B', 'E'], index=[1,2]) df2 = pd.DataFrame([['sample1',20,30], ['sample2',70,80], ['sample3',700,800]], columns=['label', 'B', 'C'], index=[2,3,4])
I would like to add rows, so the length of the dfs are the same but preserving the right order. The desired output would be:
label B E
1 sample1 2 3
2 0 0 0
3 0 0 0
4 sample4 7 8
label B C
1 sample1 20 30
2 sample2 70 80
3 sample3 700 800
4 0 0 0
I was looking into pandas three-way joining multiple dataframes on columns but I don't want to merge my dataframes. And pandas align() function : illustrative example doesn't give the desired output either. I was also thinking about comparing the 'label' column with a list and loop through to add the missing rows. If somebody could point me into the right direction, that would be great.
CodePudding user response:
You can get the common indices in the desired order, then reindex
:
# here the order matters to get the preference
# for a sorted order use:
# unique = sorted(pd.concat([df1['label'], df2['label']]).unique())
unique = pd.concat([df2['label'], df1['label']]).unique()
out1 = (df1.set_axis(df1['label'])
.reindex(unique, fill_value=0)
.reset_index(drop=True)
)
out2 = (df2.set_axis(df2['label'])
.reindex(unique, fill_value=0)
.reset_index(drop=True)
)
outputs:
# out1
label B E
0 sample1 2 3
1 0 0 0
2 0 0 0
3 sample4 7 8
# out2
label B C
0 sample1 20 30
1 sample2 70 80
2 sample3 700 800
3 0 0 0