Count How Many Times Customer Has Purchases-CodePudding

Lets say I have a dataframe like this:

           day  uid  orders     
0   2022-03-15  1      20
1   2022-03-15  2      10
2   2022-03-15  3      50 
3   2022-03-15  4       1 
4   2022-03-16  1      20  
5   2022-03-16  2      10
6   2022-03-16  3      50 
7   2022-03-16  4       1  
8   2022-03-17  1      20 
9   2022-03-17  2      10 
10  2022-03-17  3      50 
11  2022-03-17  4       1  
12  2022-03-18  1      20 
13  2022-03-18  2      10
14  2022-03-18  3      50 
15  2022-03-18  4       1

How can I get a dataframe finding the purchases done by each user id throughout the entire dataframe. Something like

 orders  users_ordered  % of total     
0   1     4              25%
1   20    4              25%
2   50     4              25%

This would mean that, throughtout the days, 4 users have 1 order, 4 users have 20 orders and 4 users have 50 orders.

% of total I think can be calculated via

 df['% of total'] = 100 * df['orders'] / df.groupby('customer__id')['orders'].transform('sum')

If i can just get how to get my target datframe.

Not taken into account: repetitive order values

    orders  users_ordered % of total
0        1             10         3%
1        2             10         3%
2       27             10         3%
3       26             10         3%
4       25             10         3%
5       24             10         3%
6       23             10         3%
7       22             10         3%
8       21             10         3%
9       20             10         3%
10      19             10         3%
11      18             10         3%
12      17             10         3%
13      16             10         3%
14      15             10         3%
15      14             10         3%
16      13             10         3%
17      12             10         3%
18      11             10         3%
19      10             10         3%
20       9             10         3%
21       8             10         3%
22       7             10         3%
23       6             10         3%
24       5             10         3%
25       4             10         3%
26       3             10         3%
27      28             10         3%

CodePudding user response：

Use Series.value_counts:

s = df['orders'].value_counts().rename('users_ordered')
new_df = \
    pd.concat((s,
               s.div(s.sum()).mul(100).astype(int)
                .astype(str).add('%').rename('% of total')), axis=1)\
      .rename_axis(index='orders')\
      .reset_index()
print(new_df)


   orders  users_ordered % of total
0      20              4        25%
1      10              4        25%
2      50              4        25%
3       1              4        25%

Or:

new_df = \
    pd.concat((df['orders'].value_counts().rename('users_ordered'),
               df['orders'].value_counts(normalize=True).mul(100)
                           .astype(int)
                           .astype(str).add('%').rename('% of total')), axis=1)\
      .rename_axis(index='orders')\
      .reset_index()

CodePudding user response：

This works:

new_df = df.groupby('orders')['uid'].count().reset_index(name='users_ordered')
new_df['% of total'] = new_df['users_ordered'].div(new_df['users_ordered'].sum()).mul(100).astype(int).astype(str).add('%')

Output:

>>> new_df
   orders  users_ordered % of total
0       1              4        25%
1      10              4        25%
2      20              4        25%
3      50              4        25%