Lets say I have a dataframe like this:
day uid orders
0 2022-03-15 1 20
1 2022-03-15 2 10
2 2022-03-15 3 50
3 2022-03-15 4 1
4 2022-03-16 1 20
5 2022-03-16 2 10
6 2022-03-16 3 50
7 2022-03-16 4 1
8 2022-03-17 1 20
9 2022-03-17 2 10
10 2022-03-17 3 50
11 2022-03-17 4 1
12 2022-03-18 1 20
13 2022-03-18 2 10
14 2022-03-18 3 50
15 2022-03-18 4 1
How can I get a dataframe finding the purchases done by each user id throughout the entire dataframe. Something like
orders users_ordered % of total
0 1 4 25%
1 20 4 25%
2 50 4 25%
This would mean that, throughtout the days, 4 users have 1 order, 4 users have 20 orders and 4 users have 50 orders.
% of total I think can be calculated via
df['% of total'] = 100 * df['orders'] / df.groupby('customer__id')['orders'].transform('sum')
If i can just get how to get my target datframe.
Not taken into account: repetitive order values
orders users_ordered % of total
0 1 10 3%
1 2 10 3%
2 27 10 3%
3 26 10 3%
4 25 10 3%
5 24 10 3%
6 23 10 3%
7 22 10 3%
8 21 10 3%
9 20 10 3%
10 19 10 3%
11 18 10 3%
12 17 10 3%
13 16 10 3%
14 15 10 3%
15 14 10 3%
16 13 10 3%
17 12 10 3%
18 11 10 3%
19 10 10 3%
20 9 10 3%
21 8 10 3%
22 7 10 3%
23 6 10 3%
24 5 10 3%
25 4 10 3%
26 3 10 3%
27 28 10 3%
CodePudding user response:
Use Series.value_counts
:
s = df['orders'].value_counts().rename('users_ordered')
new_df = \
pd.concat((s,
s.div(s.sum()).mul(100).astype(int)
.astype(str).add('%').rename('% of total')), axis=1)\
.rename_axis(index='orders')\
.reset_index()
print(new_df)
orders users_ordered % of total
0 20 4 25%
1 10 4 25%
2 50 4 25%
3 1 4 25%
Or:
new_df = \
pd.concat((df['orders'].value_counts().rename('users_ordered'),
df['orders'].value_counts(normalize=True).mul(100)
.astype(int)
.astype(str).add('%').rename('% of total')), axis=1)\
.rename_axis(index='orders')\
.reset_index()
CodePudding user response:
This works:
new_df = df.groupby('orders')['uid'].count().reset_index(name='users_ordered')
new_df['% of total'] = new_df['users_ordered'].div(new_df['users_ordered'].sum()).mul(100).astype(int).astype(str).add('%')
Output:
>>> new_df
orders users_ordered % of total
0 1 4 25%
1 10 4 25%
2 20 4 25%
3 50 4 25%