lst = [
['s001','b1','typeA'],['s002','b1','typeB'],['s003','b1','typeC'],['s004','b1','typeD'],
['s005','b1','typeA'],['s006','b1','typeB'],['s007','b1','typeC'],['s008','b1','typeD'],
['s009','b2','typeA'],['s010','b2','typeB'],['s011','b2','typeC']
]
df=pd.DataFrame(lst,columns=['sn','setting','status'])
sn setting status
0 s001 b1 typeA
1 s002 b1 typeB
2 s003 b1 typeC
3 s004 b1 typeD
4 s005 b1 typeA
5 s006 b1 typeB
6 s007 b1 typeC
7 s008 b1 typeD
8 s009 b2 typeA
9 s010 b2 typeB
10 s011 b2 typeC
(each row on sn is unique)
I can use group by to get the info.
df.groupby(['setting','status']).size().reset_index()
setting status 0
0 b1 typeA 2
1 b1 typeB 2
2 b1 typeC 2
3 b1 typeD 2
4 b2 typeA 1
5 b2 typeB 1
6 b2 typeC 1
But I prefer to group them by setting column and count total & each status number, like bellow format:
setting total tppeA typeB typeC typeD
b1 8 2 2 2 2
b2 3 1 1 1 0
(typeA to typeD are known type names, but a given dataset would not always have all those 4 unique types in it).
But I don't know how to convert them to columns (for total column, I can plus 4 types status)
CodePudding user response:
Let us do
out = pd.crosstab(df.setting,df.status,margins = True,margins_name = 'Total').drop(['Total']) # reset_index()
Out[97]:
status typeA typeB typeC typeD Total
setting
b1 2 2 2 2 8
b2 1 1 1 0 3
CodePudding user response:
use pivot
df2=df.groupby(['setting','status']).size().reset_index().pivot_table(index='setting', columns='status', values=0).fillna(0)
df2['total'] = df2.sum(axis=1)
df2
status typeA typeB typeC typeD total
setting
b1 2.0 2.0 2.0 2.0 8.0
b2 1.0 1.0 1.0 0.0 3.0