I have three dataframes
df1 :
Date ID Number ID2 info_df1
2021-12-11 1 34 36 60
2021-12-10 2 33 35 57
2021-12-09 3 32 34 80
2021-12-08 4 3133 55
df2:
Date ID Number ID2 info_df2
2021-12-10 2 18 20 50
2021-12-11 1 34 36 89
2021-12-10 2 33 35 40
2021-12-09 3 32 34 92
df3:
Date ID Number ID2 info_df3
2021-12-10 2 18 20 57
2021-12-10 2 18 20 63
2021-12-11 1 34 36 52
2021-12-10 2 33 35 33
I need a data frame with info column from df1,df2 and df3 and Date,ID,Number,ID2 as index.
Format of the merged dataframe should consist these columns:
Date ID Number ID2 info_df1 info_df2
info_df3
CodePudding user response:
If you trying to merge the dataframe based on Date
, I think what you need is merge
function:
mergedDf = df1.merge(df2, on="Date").merge(df3, on="Date");
mergedDf.set_index("ID2", inplace = True)
But if you are trying to merge dataframes based on multiple columns, you can use a list
of column names on the on
argument:
mergedDf = df1.merge(df2, on=["Date", "ID", "ID2"]).merge(df3, on=["Date", "ID", "ID2"]);
mergedDf.set_index("ID2", inplace = True)
CodePudding user response:
Two steps:
- first,
pandas.concat(<DFs-list>)
all those DFs into adf
; - then, define a multi-index with
df.set_index(<col-names-list>)
.
That will do it. Sure, you have to read some docs (here below), but those two steps should be about it.
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.set_levels.html
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_frame.html
CodePudding user response:
As others have mentioned, you need to merge the dataframes together. Using the built-in function functools.reduce
, we can do this dynamically (for any number of dataframes) and easily:
i = 0
def func(x, y):
global i
i = 1
return y.merge(x.rename({'info': f'info_df{i 1}'}, axis=1), on=['Date', 'ID', 'Number', 'ID2'], how='outer')
dfs = [df1, df2, df3]
new_df = ft.reduce(func, dfs).rename({'info': 'info_df1'}, axis=1)
Output:
>>> new_df
Date ID Number ID2 info_df1 info_df7 info_df6
0 2021-12-10 2 18 20 57.0 50.0 NaN
1 2021-12-10 2 18 20 63.0 50.0 NaN
2 2021-12-11 1 34 36 52.0 89.0 60.0
3 2021-12-10 2 33 35 33.0 40.0 57.0
4 2021-12-09 3 32 34 NaN 92.0 80.0
5 2021-12-08 4 31 33 NaN NaN 55.0