Home > OS >  How to get Normalized values of counts in Pandas against each individual entry in a different column
How to get Normalized values of counts in Pandas against each individual entry in a different column

Time:11-22

I have a DataFrame which looks like this:

df = Pd.DataFrame({'Correct Prediction (Insert None if none of the predictions are correct)':[1,0,1,4,'NONE',1,0,3,2,'NONE'],
                   'Subject':['Physics','Maths','Chemistry','Biology','Physics','Physics','Maths','Biology','Chemistry','Maths']})

So I want to get all the entries to see How much % of each Subject is in 0,1,2,3,4 and NONE. Let us suppose that how much of the Physics are in NONE so I can get the Total Questions which are in NONE AND belong to Physics and then divide by the Total Number of Physics Questions. I can get this the hard way using the code below:

df['Subject'].value_counts()
df[(df['Subject'] == 'Physics') & (df['Correct Prediction (Insert None if none of the predictions are correct)'] == position)].shape[0]

enter image description here

But what is the easy and better way to do this?

I tried

pd.crosstab(df['Correct Prediction (Insert None if none of the predictions are correct)'],df['Subject'], normalize = True)

But it gives me weird values such as 0.1 instead of 0.333

enter image description here

I can do this in loop as:

counts = df['Subject'].value_counts()
for index in counts.index:
    print(f"Results for: {index}\n")
    total_count = counts[index]
    for position in [0,1,2,3,4,'NONE']:
        i = df[(df['Subject'] == index) & (df['Correct Prediction (Insert None if none of the predictions are correct)'] == position)].shape[0]
        print(f"Position {position} : {round((i / total_count)*100, 2)}%")
    print("-"*50,'\n')

CodePudding user response:

Try the following:

correct_prediction = pd.Categorical([df['Correct Prediction (Insert None if none of the predictions are correct)'].tolist(), categories=[0,1,2, 3, 'NONE'])
subject = pd.Categorical(df['Subject'].tolist(), categories=['Physics', 'Maths', 'Chemistry', 'Biology'])

pd.crosstab(correct_prediction, subject, normalize='columns')

enter image description here

  • Related