I'm currently working on a Python - Pandas project.
I have this dataFrame :
I want to refine this dataframe to have one line per Name with the sum of Damage. This is ok in fact.
The real issue is that I also want to keep all columns. Ally can only be "T", and EncId is alway the same, so it's ok to deal with it. But for Duration and Job it's an other story.
Here I want to keep the Job if not 0, and keep the longest duration, for example.
I realy can't figure how to handle this. I miss some methodology.
Thanks in advence for your time :)
CodePudding user response:
Use groupby_agg
:
df = df.groupby('Name', as_index=False) \
.agg({'EncId': 'first', 'Ally': 'first', 'Name': 'first',
'Duration': 'max', 'Job': 'max', 'Damage': 'max'})
Output:
>>> df
EncId Ally Name Duration Job Damage
0 91513775 T Naaru Segawa 203 Smn 2274680
Setup:
df = pd.DataFrame(
{'EncId': [91513775, 91513775],
'Ally': ['T', 'T'],
'Name': ['Naaru Segawa', 'Naaru Segawa'],
'Duration': [191, 203],
'Job': ['0', 'Smn'],
'Damage': [514680, 2274680]})
print(df)
# Output:
EncId Ally Name Duration Job Damage
0 91513775 T Naaru Segawa 191 0 514680
1 91513775 T Naaru Segawa 203 Smn 2274680