Home > database >  Get percentage and count in dataframe after group by
Get percentage and count in dataframe after group by

Time:11-13

Let's say i have a dataframe like this:

    name     level    finished
0   name1    TOP         1
1   name1    NON-TOP     1
2   name1    NON-TOP     1
3   name1    TOP         1
4   name1    TOP         0
5   name1    NON-TOP     0
6   name1    TOP         0
7   name1    TOP         1
8   name2    TOP         1
9   name2    TOP         0

I want to group by name and then have two columns for TOP and NON-TOP and then count values. For finished I want to have like a percentage (1 means it is done, 0 means it is not done). In the end, the resulting dataframe should look like this:

   name   top  non_top  finished(%)
0  name1   5      3        62.5 %
1  name2   2      0        50 %

Any idea how can I achieve this?

CodePudding user response:

Let's first calculate the level column. Let's group according to the name column and calculate the distribution of the level column with the value_counts function.

top=df.groupby(["name"]).agg({"level": "value_counts"}).unstack(fill_value=0)
top.columns = top.columns.to_flat_index()
top.columns=[i[1] for i in top.columns]
print(top)
'''
       NON-TOP  TOP
name               
name1        3    5
name2        0    2
'''

now, let's calculate percentages:

finished=df.groupby('name').agg({'finished':'mean'})
finished['finished']=100 - finished['finished'] * 100
print(finished)
'''
           finished
name           
name1      62.5
name2      50.0
'''

finally, combine these two df's.

final=top.join(finished)
print(final)
'''
       NON-TOP  TOP  finished
name                         
name1        3    5      62.5
name2        0    2      50.0
'''
  • Related