I have a data frame like so:
df = pd.DataFrame({
'A':np.random.random(20),
'B':np.random.random(20),
'Gen': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
})
Is there a simple way with pandas to find the range (min and max values) in each column by group 'Gen'? In real life, I can have lots of columns A, B, ... (>20). The result could look something like:
Var Gen Min Max
A 1 x x
B 1 x x
A 2 x x
...
CodePudding user response:
You can melt
to reshape, then groupby
agg
to aggregate per group:
(df.melt(id_vars='Gen', var_name='Var')
.groupby(['Var', 'Gen'])['value']
.agg(['min', 'max'])
#.reset_index() # uncomment to have Gen/Var as columns
)
Alternatively, aggregate first, then stack
:
df.groupby('Gen').agg(['min', 'max']).stack(level=0)
output:
min max
Var Gen
A 1 0.009241 0.813241
2 0.142218 0.751106
3 0.005829 0.991914
4 0.033108 0.579354
B 1 0.241094 0.990492
2 0.030335 0.848735
3 0.209076 0.785530
4 0.277970 0.962307