I have 3 pd.DataFrame
s that I need to merge. Each of them contains the same data except for the last 5 columns, and each is 9276 rows x 67 cols. Schematically, they look like this:
MWE of the data:
df1 = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [np.nan, np.nan, 5, 5, 6, 6]})
df2 = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [4, 4, np.nan, np.nan, 6, 6]})
df3 = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [4, 4, 5, 5, np.nan, np.nan]})
expectation = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [4.0, 4.0, 5.0, 5.0, 6.0, 6.0]})
print(expectation)
A B C
0 4 2 4.0
1 5 2 4.0
2 6 2 5.0
3 7 3 5.0
4 8 3 6.0
5 9 3 6.0
I've tried pd.merge
, pd.concat
with their different arguments, but they don't do the job. None of the docs and other questions I've looked at have the same structure as my data; they all use examples with either unique indices or unique columns. Also, in practice my data will still contain np.nan
s in other parts of the dataframes, which I need to conserve, so I can't dropna
or anything similar. How can I merge the dataframes while maintaining the structure of the databases?
CodePudding user response:
Since all of your indexes are the same, the simplest approach would be DataFrame.fillna() with a dataframe argument:
df1.fillna(df2).fillna(df3)
Output:
A B C
0 4 2 4.0
1 5 2 4.0
2 6 2 5.0
3 7 3 5.0
4 8 3 6.0
5 9 3 6.0