Home > Mobile >  Create dataframe from existing one by counting the rows according to the values in a specific column
Create dataframe from existing one by counting the rows according to the values in a specific column

Time:12-16

I have this dataframe,

|order_id|customername|product_count|
|1       |a          |2            |
|2       |b          |-1           |
|3       |Q          |3            |
|4       |a          |-1           |
|5       |c          |-1           |
|6       |Q          |-1           |
|7       |d          |-1           |

What I want is another dataframe with the count of the rows wherever it is 'Q' in customername and count of the rows with the rest of the items in customername. As given below where test2 represents Q and test1 represents rest of the items. Pecentage column is (total request/count of coustomername)*100, which in this case is (5/7)*100 and (2/7)*100

|users|Total request|Percentage|
|test1 |5         |  71.4      |
|test2 |2         |  28.5      |

CodePudding user response:

Compare column for Q and count by Series.value_counts, last rename values of index and create DataFrame:

df = pd.DataFrame({'order_id': [1, 2, 3, 4, 5, 6, 7], 
                   'customername': ['a', 'b', 'Q', 'a', 'c', 'Q', 'd'],
                   'product_count': [2, -1, 3, -1, -1, -1, -1]})
print (df)
   order_id customername  product_count
0         1            a              2
1         2            b             -1
2         3            Q              3
3         4            a             -1
4         5            c             -1
5         6            Q             -1
6         7            d             -1

s = df['customername'].eq('Q').value_counts().rename({True:'test2', False:'test1'})

df1 = s.rename_axis('users').reset_index(name='Total request')
df1['Percentage'] = df1['Total request'].div(df1['Total request'].sum()).mul(100).round(2)
print (df1)
   users  Total request  Percentage
0  test1              5       71.43
1  test2              2       28.57
  • Related