Home > Blockchain >  Do a groupby and a sum but keeping other column intact in a Pandas DataFrame
Do a groupby and a sum but keeping other column intact in a Pandas DataFrame

Time:10-14

I'm currently working on a Python - Pandas project.

I have this dataFrame :

enter image description here

I want to refine this dataframe to have one line per Name with the sum of Damage. This is ok in fact.

The real issue is that I also want to keep all columns. Ally can only be "T", and EncId is alway the same, so it's ok to deal with it. But for Duration and Job it's an other story.

Here I want to keep the Job if not 0, and keep the longest duration, for example.

I realy can't figure how to handle this. I miss some methodology.

Thanks in advence for your time :)

CodePudding user response:

Use groupby_agg:

df = df.groupby('Name', as_index=False) \
       .agg({'EncId': 'first', 'Ally': 'first', 'Name': 'first',
             'Duration': 'max', 'Job': 'max', 'Damage': 'max'})

Output:

>>> df
      EncId Ally          Name  Duration  Job   Damage
0  91513775    T  Naaru Segawa       203  Smn  2274680

Setup:

df = pd.DataFrame(
  {'EncId': [91513775, 91513775],
   'Ally': ['T', 'T'],
   'Name': ['Naaru Segawa', 'Naaru Segawa'],
   'Duration': [191, 203],
   'Job': ['0', 'Smn'],
   'Damage': [514680, 2274680]})
print(df)

# Output:
      EncId Ally          Name  Duration  Job   Damage
0  91513775    T  Naaru Segawa       191    0   514680
1  91513775    T  Naaru Segawa       203  Smn  2274680
  • Related