I have the following table:
Name | Age | Data_1 | Data_2 |
---|---|---|---|
Tom | 10 | Test | |
Tom | 10 | Foo | |
Anne | 20 | Bar |
How I can merge this rows to get this output:
Name | Age | Data_1 | Data_2 |
---|---|---|---|
Tom | 10 | Test | Foo |
Anne | 20 | Bar |
I tried this code (and some other related (agg, groupby other fields, et cetera)):
import pandas as pd
data = [['tom', 10, 'Test', ''], ['tom', 10, 1, 'Foo'], ['Anne', 20, '', 'Bar']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Data_1', 'Data_2'])
df = df.groupby("Name").sum()
print(df)
But I only get something like this:
c2 | |
---|---|
Name | |
-------- | -------------- |
Anne | Foo |
Tom | Bar |
CodePudding user response:
Use this if the empty cells are NaN :
(df.set_index(['Name', 'Age'])
.stack()
.groupby(level=[0, 1, 2])
.apply(''.join)
.unstack()
.reset_index()
)
Otherwise, add this line df.replace('', np.nan, inplace=True)
before the code above.
# Output
Name Age Data_1 Data_2
0 Anne 20 NaN Bar
1 Tom 10 Test Foo
CodePudding user response:
Just a groupby and a sum will do.
df.groupby(['Name','Age']).sum().reset_index()
Name Age Data_1 Data_2
0 Anne 20 Bar
1 tom 10 Test Foo