I have a dataframe 'students' that looks like the following:
Cumulative.GPA Athlete
0 3.9 Yes
1 3.3 Yes
2 4.0 No
3 3.6 Yes
I'm trying to get a value_counts table of GPAs separated into two columns: 1 for athletes and 1 for non-athletes. Output should look like this (with leftmost column obviously continuing down to 0.0)
Cumulative.GPA.athlete Cumulative.GPA.nonathlete
4.0 0 1
3.9 1 0
3.8 0 0
3.7 0 0
3.6 1 0
3.5 0 0
3.4 0 0
3.3 1 0
CodePudding user response:
We can use crosstab
to count the frequencies of Athlete vs non-athlete GPA then reindex
to add missing GPA increments then rename
and rename_axis
to make the output match the expected:
res = (
pd.crosstab(df['Cumulative.GPA'], df['Athlete'])
.reindex(index=np.arange(40, 33 - 1, -1) / 10,
columns=['Yes', 'No'],
fill_value=0)
.rename(columns={'Yes': 'Cumulative.GPA.athlete',
'No': 'Cumulative.GPA.nonathlete'})
.rename_axis(index=None, columns=None)
)
*Using int
steps to avoid any precision issues
res
:
Cumulative.GPA.athlete Cumulative.GPA.nonathlete
4.0 0 1
3.9 1 0
3.8 0 0
3.7 0 0
3.6 1 0
3.5 0 0
3.4 0 0
3.3 1 0
If wanting the shown GPA range to be dynamic, instead of hardcoded bounds, Series.max
and Series.min
can be used to set the reindexing bounds as recommended by U12-Forward:
res = (
pd.crosstab(df['Cumulative.GPA'], df['Athlete'])
.reindex(index=np.arange(int(df['Cumulative.GPA'].max() * 10),
int(df['Cumulative.GPA'].min() * 10) - 1,
-1) / 10,
columns=['Yes', 'No'],
fill_value=0)
.rename(columns={'Yes': 'Cumulative.GPA.athlete',
'No': 'Cumulative.GPA.nonathlete'})
.rename_axis(index=None, columns=None)
)
Setup Used:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Cumulative.GPA': [3.9, 3.3, 4.0, 3.6],
'Athlete': ['Yes', 'Yes', 'No', 'Yes']
})
CodePudding user response:
Try using get_dummies
with set_index
and reindex
:
>>> pd.get_dummies(df, 'Athlete').set_index('Cumulative.GPA').reindex(np.round(np.arange(df['Cumulative.GPA'].min(), df['Cumulative.GPA'].max(), 0.1), 1)[::-1], fill_value=0).reset_index()
Cumulative.GPA Athlete_No Athlete_Yes
0 4.0 1 0
1 3.9 0 1
2 3.8 0 0
3 3.7 0 0
4 3.6 0 1
5 3.5 0 0
6 3.4 0 0
7 3.3 0 1
>>>