Let's say i have a dataframe like this:
name level finished
0 name1 TOP 1
1 name1 NON-TOP 1
2 name1 NON-TOP 1
3 name1 TOP 1
4 name1 TOP 0
5 name1 NON-TOP 0
6 name1 TOP 0
7 name1 TOP 1
8 name2 TOP 1
9 name2 TOP 0
I want to group by name and then have two columns for TOP and NON-TOP and then count values. For finished I want to have like a percentage (1 means it is done, 0 means it is not done). In the end, the resulting dataframe should look like this:
name top non_top finished(%)
0 name1 5 3 62.5 %
1 name2 2 0 50 %
Any idea how can I achieve this?
CodePudding user response:
Let's first calculate the level column. Let's group according to the name column and calculate the distribution of the level column with the value_counts function.
top=df.groupby(["name"]).agg({"level": "value_counts"}).unstack(fill_value=0)
top.columns = top.columns.to_flat_index()
top.columns=[i[1] for i in top.columns]
print(top)
'''
NON-TOP TOP
name
name1 3 5
name2 0 2
'''
now, let's calculate percentages:
finished=df.groupby('name').agg({'finished':'mean'})
finished['finished']=100 - finished['finished'] * 100
print(finished)
'''
finished
name
name1 62.5
name2 50.0
'''
finally, combine these two df's.
final=top.join(finished)
print(final)
'''
NON-TOP TOP finished
name
name1 3 5 62.5
name2 0 2 50.0
'''