I have following tables in a pandas dataframe:
ID1 | ID2 | Value1 |
---|---|---|
Data1 | Data2 | Data11 |
ID1 | ID2 | Value2 |
---|---|---|
Data1 | Data2 | Data12 |
ID1 | ID2 | Value3 |
---|---|---|
Data1 | Data2 | Data13 |
My aim is a table of form:
ID1 | ID2 | Value1 | Value2 | Value3 |
---|---|---|---|---|
Data1 | Data2 | Data11 | Data12 | Data13 |
What is the easiest way to do that with python?
CodePudding user response:
Perform multiple merge
s in a chain:
df1.merge(df2.merge(df3))
# ID1 ID2 Value1 Value2 Value3
# 0 Data1 Data2 Data11 Data12 Data13
Or with functools.reduce
syntax:
from functools import reduce
reduce(lambda L, R: L.merge(R, on=['ID1', 'ID2']), [df1, df2, df3])
CodePudding user response:
new_df = pd.concat([df1, df2, df3]).groupby(['ID1', 'ID2'], as_index=False).first()
Output:
>>> new_df
ID1 ID2 Value1 Value2 Value3
0 Data1 Data2 Data11 Data12 Data13
CodePudding user response:
It's very similar to @user17242583's answer. The only difference is, nth(0)
will get the first even if it's NaN
while first
won't.
df_new = pd.concat([df1, df2, df3]).groupby(['ID1', 'ID2'], as_index=False).nth(0)