Have a dataframe like this:
df = pd.DataFrame((['1990-01-01','A','S1','2','string2','string3'],
['1990-01-01','A','S2','1','string1','string4'],
['1990-01-01','A','S3','1','string5','string6']),columns=
["date","type","status","count","s1","s2")
date type status count s1 s2
0 1990-01-01 A S1 2 string2 string3
1 1990-01-01 A S2 1 string1 string4
2 1990-01-01 A S3 1 string5 string6
...
I want to get bellow result (each date and each type should have single row, and get the min of s1 column, get max of s2 column)
date type S1 S2 S3 min_s1 max_s2
1990-01-01 A 2 1 1 string1 string6
I tried to use pivot_table
df.pivot_table(index=['date','type'],columns=['status'],values=['count','s1','s2'], aggfunc={
'count':np.sum,
's1': np.min,
's2': np.max
})
But this would only get bellow result, which leads to multiple columns and not the final result.
count s1 s2
status S1 S2 S3 S1 S2 S3 S1 S2 S3
date type
1990-01-01 A 2 1 1 string2 string1 string5 string3 string4 string6
Anyone idea? Thanks.
CodePudding user response:
Looks like you want to combine a pivot
and groupby.agg
:
(df.pivot(index=['date','type'],columns='status', values='count')
.join(df.groupby(['date', 'type']).agg({'s1': 'min', 's2': 'max'}))
.reset_index()
)
output:
date type S1 S2 S3 s1 s2
0 1990-01-01 A 2 1 1 string1 string6