I have the following DataFrame:
import pandas as pd
data = {'id': ['A', 'B', 'C', 'D', 'E'],
'c1': [4,7,9,0,3],
'c2': [1,8,3,0,2]}
data = pd.DataFrame(data)
I want to groupby('id')
and take the maximum value among c1
and c2
.
The desired output:
data = {'id': ['A', 'B', 'C', 'D','E'],
'c': [4,8,9,0,3]}
data = pd.DataFrame(data)
CodePudding user response:
First create index
by id
, get max per rows and then aggregate max
if possible id
are duplicated values:
df = data.set_index('id').max(axis=1).groupby(level=0).max().reset_index(name='c')
print (df)
id c
0 A 4
1 B 8
2 C 9
3 D 0
4 E 3
If id
are not duplicated like in sample data remove aggregation:
df = data.set_index('id').max(axis=1).reset_index(name='c')
print (df)
id c
0 A 4
1 B 8
2 C 9
3 D 0
4 E 3
CodePudding user response:
Use melt
and GroupBy.max
:
(data.melt(id_vars='id', value_name='c')
.groupby('id', as_index=False)['c'].max()
)
output:
id c
0 A 4
1 B 8
2 C 9
3 D 0
4 E 3
CodePudding user response:
There are many ways to do so and you don't necessarily need a groupby, but just a new column based on the values of "c1" and "c2".
I like to use np.where in those cases:
import pandas as pd
import numpy as np
data = {'id': ['A', 'B', 'C', 'D', 'E'],
'c1': [4,7,9,0,3],
'c2': [1,8,3,0,2]}
data = pd.DataFrame(data)
data ["c"] = np.where (data["c1"] >= data["c2"], data ["c1"], data["c2"])
del data ["c1"]
del data ["c2"]
The first argument is the condition to evaluate, the second is the return value in case the condition is matched, the third is the "else".