I have the following pandas data frame in which I stored my Win/Loss results of multiple models and multiple companies
company | Model_1 | Winloss | Model_2 | Winloss2 |
---|---|---|---|---|
Company1 | KNN | W | GPR | L |
Company1 | KNN | L | PLS | W |
Company1 | KNN | L | KRR | W |
Company1 | KNN | L | XGB | W |
Company1 | GPR | L | SGD | W |
Company2 | GPR | L | PLS | W |
Company2 | KRR | L | XGB | W |
I want to group by both company and models and count Win-loss for each model within same company so that I can later unstack the result to and have the output to look like this:
('company', '') | ('DT', 'L') | ('DT', 'W') | ('GPR', 'L') | ('KNN', 'L') | ('KNN', 'W') | ('KRR', 'W') | ('PLS', 'W') | ('SGD', 'W') | ('SVR', 'L') | ('SVR', 'W') |
---|---|---|---|---|---|---|---|---|---|---|
Company1 | 3.0 | 2.0 | 5.0 | 3.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 1.0 |
Company2 | 6.0 | 2.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 0.0 | 6.0 | 1.0 |
Company3 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Company4 | 6.0 | 1.0 | 5.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
Company5 | 7.0 | 1.0 | 5.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
The table above is a result of my follwoing code but the numbers results of counted values were not accurate :
WLPerCompany = WinLoss.groupby(['company','Model_1','Winloss'])
['Winloss'].count()
WinLossResults = pd.DataFrame(WLPerCompany)
WinLossResults.columns = [*WinLossResults.columns[:-1], 'counts']
WinLossResults= WinLossResults['counts'].unstack(level=['Model_1',
'Winloss'])
WinLossResults= WinLossResults.fillna(0)
WinLossResults
CodePudding user response:
Use wide_to_long
for reshape first and then crosstab
:
df = pd.wide_to_long(df.reset_index().rename(columns={'Winloss':'Winloss1'}),
stubnames=['Model_','Winloss'],
i=['index','company'],
j='tmp').reset_index()
df = pd.crosstab(df['company'], [df['Model_'], df['Winloss']])
print (df)
Model_ GPR KNN KRR PLS SGD XGB
Winloss L L W L W W W W
company
Company1 2 3 1 0 1 1 1 1
Company2 1 0 0 1 0 1 0 1