Home > front end >  How to allocate some value to a row on the basis of their sum in different column pandas?
How to allocate some value to a row on the basis of their sum in different column pandas?

Time:04-19

I am currently working with a database where I have to allocate different row entries to specific buckets on the basis of their sum over a metric in python pandas. The below is the sample initial dataframe.

Index    games_played   grade
1                  1    0
2                  2    0
3                  2    0
4                  3    0
5                  3    0
6                  3    0
7                  3    0
8                  4    0
9                  4    0
10                 4    0
11                 4    0
12                 4    0

After sorting the dataset in ascending order on games_played, I want to allocate grade (E,D,C,B,A) to the rows summing to 10% of the sum of games_played. For instance,let the summation of games_played across the dataframe is 800 then the first number of rows that sums <= 8 will be allocated E and then the next number of rows summing to 8 will be graded as D and so on.

Index    games_played   grade
1                   1   E
2                   2   E
3                   2   E
4                   3   E
5                   3   D
6                   3   D
7                   3   C
8                   4   C
9                   4   B
10                  4   B
11                  4   A
12                  4   A

I hope I am clear.

Thank you in advance.

CodePudding user response:

Here is one way to complete this. There is a problem in the grading, as you will notice from the output, from values being greater than 10% of the total sum. This would mean that each of these should be in a separate grade, which is not possible with only 5 (E, D, C, B, A). I have therefore added in a condition to the else which will not "increase" the grade if it is also at the maximum.

import pandas as pd

df = pd.DataFrame({'games_played': [8,9,1,9,3,7,5,2,9,0,6,4,1,7,0,6,3,0,6]})

grades = ['E', 'D', 'C', 'B', 'A']

df = df.sort_values('games_played').reset_index(drop=True)

df['grade'] = 0
x = 0
count = 0
for i in range(0, len(df)):
    if (count   df['games_played'].iloc[i]) < df['games_played'].sum()*0.1:
        count  = df['games_played'].iloc[i]
        df['grade'].iat[i] = grades[x]
    else:
        count = df['games_played'].iloc[i]
        if (len(grades) - 1) > x:
            x  = 1
        df['grade'].iat[i] = grades[x]

df
#Out: 
#    games_played grade
#0              0     E
#1              0     E
#2              0     E
#3              1     E
#4              1     E
#5              2     E
#6              3     E
#7              3     D
#8              4     D
#9              5     C
#10             6     B
#11             6     A
#12             6     A
#13             7     A
#14             7     A
#15             8     A
#16             9     A
#17             9     A
#18             9     A

Hopefully this answers your question.

  • Related