I am currently working with a database where I have to allocate different row entries to specific buckets on the basis of their sum over a metric in python pandas. The below is the sample initial dataframe.
Index games_played grade
1 1 0
2 2 0
3 2 0
4 3 0
5 3 0
6 3 0
7 3 0
8 4 0
9 4 0
10 4 0
11 4 0
12 4 0
After sorting the dataset in ascending order on games_played, I want to allocate grade (E,D,C,B,A) to the rows summing to 10% of the sum of games_played. For instance,let the summation of games_played across the dataframe is 800 then the first number of rows that sums <= 8 will be allocated E and then the next number of rows summing to 8 will be graded as D and so on.
Index games_played grade
1 1 E
2 2 E
3 2 E
4 3 E
5 3 D
6 3 D
7 3 C
8 4 C
9 4 B
10 4 B
11 4 A
12 4 A
I hope I am clear.
Thank you in advance.
CodePudding user response:
Here is one way to complete this. There is a problem in the grading, as you will notice from the output, from values being greater than 10% of the total sum. This would mean that each of these should be in a separate grade, which is not possible with only 5 (E, D, C, B, A). I have therefore added in a condition to the else
which will not "increase" the grade if it is also at the maximum.
import pandas as pd
df = pd.DataFrame({'games_played': [8,9,1,9,3,7,5,2,9,0,6,4,1,7,0,6,3,0,6]})
grades = ['E', 'D', 'C', 'B', 'A']
df = df.sort_values('games_played').reset_index(drop=True)
df['grade'] = 0
x = 0
count = 0
for i in range(0, len(df)):
if (count df['games_played'].iloc[i]) < df['games_played'].sum()*0.1:
count = df['games_played'].iloc[i]
df['grade'].iat[i] = grades[x]
else:
count = df['games_played'].iloc[i]
if (len(grades) - 1) > x:
x = 1
df['grade'].iat[i] = grades[x]
df
#Out:
# games_played grade
#0 0 E
#1 0 E
#2 0 E
#3 1 E
#4 1 E
#5 2 E
#6 3 E
#7 3 D
#8 4 D
#9 5 C
#10 6 B
#11 6 A
#12 6 A
#13 7 A
#14 7 A
#15 8 A
#16 9 A
#17 9 A
#18 9 A
Hopefully this answers your question.