I have this dataframe,
|order_id|customername|product_count|
|1 |a |2 |
|2 |b |-1 |
|3 |Q |3 |
|4 |a |-1 |
|5 |c |-1 |
|6 |Q |-1 |
|7 |d |-1 |
What I want is another dataframe with the count of the rows wherever it is 'Q' in customername and count of the rows with the rest of the items in customername. As given below where test2 represents Q and test1 represents rest of the items. Pecentage column is (total request/count of coustomername)*100, which in this case is (5/7)*100 and (2/7)*100
|users|Total request|Percentage|
|test1 |5 | 71.4 |
|test2 |2 | 28.5 |
CodePudding user response:
Compare column for Q
and count by Series.value_counts
, last rename values of index and create DataFrame
:
df = pd.DataFrame({'order_id': [1, 2, 3, 4, 5, 6, 7],
'customername': ['a', 'b', 'Q', 'a', 'c', 'Q', 'd'],
'product_count': [2, -1, 3, -1, -1, -1, -1]})
print (df)
order_id customername product_count
0 1 a 2
1 2 b -1
2 3 Q 3
3 4 a -1
4 5 c -1
5 6 Q -1
6 7 d -1
s = df['customername'].eq('Q').value_counts().rename({True:'test2', False:'test1'})
df1 = s.rename_axis('users').reset_index(name='Total request')
df1['Percentage'] = df1['Total request'].div(df1['Total request'].sum()).mul(100).round(2)
print (df1)
users Total request Percentage
0 test1 5 71.43
1 test2 2 28.57