Home > Blockchain >  How can I merge 3 Pandas DataFrames containing the same data except for a few columns?
How can I merge 3 Pandas DataFrames containing the same data except for a few columns?

Time:11-15

I have 3 pd.DataFrames that I need to merge. Each of them contains the same data except for the last 5 columns, and each is 9276 rows x 67 cols. Schematically, they look like this:

enter image description here

MWE of the data:

df1 = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [np.nan, np.nan, 5, 5, 6, 6]})
df2 = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [4, 4, np.nan, np.nan, 6, 6]})
df3 = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [4, 4, 5, 5, np.nan, np.nan]})

expectation = pd.DataFrame({"A": [4, 5, 6, 7, 8, 9], "B": [2, 2, 2, 3, 3, 3], "C": [4.0, 4.0, 5.0, 5.0, 6.0, 6.0]})
print(expectation)

   A  B    C
0  4  2  4.0
1  5  2  4.0
2  6  2  5.0
3  7  3  5.0
4  8  3  6.0
5  9  3  6.0

I've tried pd.merge, pd.concat with their different arguments, but they don't do the job. None of the docs and other questions I've looked at have the same structure as my data; they all use examples with either unique indices or unique columns. Also, in practice my data will still contain np.nans in other parts of the dataframes, which I need to conserve, so I can't dropna or anything similar. How can I merge the dataframes while maintaining the structure of the databases?

CodePudding user response:

Since all of your indexes are the same, the simplest approach would be DataFrame.fillna() with a dataframe argument:

df1.fillna(df2).fillna(df3)

Output:

   A  B    C
0  4  2  4.0
1  5  2  4.0
2  6  2  5.0
3  7  3  5.0
4  8  3  6.0
5  9  3  6.0
  • Related