groupby and max in pandas-CodePudding

I have the following DataFrame:

import pandas as pd

data = {'id': ['A', 'B', 'C', 'D', 'E'],
        'c1': [4,7,9,0,3],
        'c2': [1,8,3,0,2]}

data = pd.DataFrame(data)

I want to groupby('id') and take the maximum value among c1 and c2.

The desired output:

data = {'id': ['A', 'B', 'C', 'D','E'],
        'c': [4,8,9,0,3]}

data = pd.DataFrame(data)

CodePudding user response：

First create index by id, get max per rows and then aggregate max if possible id are duplicated values:

df = data.set_index('id').max(axis=1).groupby(level=0).max().reset_index(name='c')
print (df)
  id  c
0  A  4
1  B  8
2  C  9
3  D  0
4  E  3

If id are not duplicated like in sample data remove aggregation:

df = data.set_index('id').max(axis=1).reset_index(name='c')
print (df)
  id  c
0  A  4
1  B  8
2  C  9
3  D  0
4  E  3

CodePudding user response：

Use melt and GroupBy.max:

(data.melt(id_vars='id', value_name='c')
     .groupby('id', as_index=False)['c'].max()
)

output:

CodePudding user response：

There are many ways to do so and you don't necessarily need a groupby, but just a new column based on the values of "c1" and "c2".

I like to use np.where in those cases:

import pandas as pd
import numpy as np

data = {'id': ['A', 'B', 'C', 'D', 'E'],
        'c1': [4,7,9,0,3],
        'c2': [1,8,3,0,2]}

data = pd.DataFrame(data)

data ["c"] = np.where (data["c1"] >= data["c2"], data ["c1"], data["c2"])
del data ["c1"]
del data ["c2"]

The first argument is the condition to evaluate, the second is the return value in case the condition is matched, the third is the "else".