I need to group a dataframe by "Day" and divide the values from column "Count" ("Clicked"/"Delivered")
Day | Status | Channel | Count |
---|---|---|---|
2022-11-08 | Delivered | SMS | 288322 |
2022-11-08 | Clicked | SMS | 1132 |
2022-11-13 | Delivered | SMS | 246203 |
2022-11-13 | Clicked | SMS | 4684 |
2022-11-19 | Delivered | SMS | 154872 |
2022-11-19 | Clicked | SMS | 1222 |
2022-11-20 | Delivered | SMS | 253716 |
2022-11-20 | Clicked | SMS | 2384 |
I used pandas .groupby() to group by "Day", but i don´t know how to specify to divide "Count". Like if I needed to sum values I'd use .sum(), but there isn´t a .divide() unfortunally.
CodePudding user response:
1. Using .unstack()
(when one row of each type exists)
You don't need groupby
for this if you only have 1 clicked and 1 delivered value per date (as your example suggests).
You can use .unstack()
to get the result you need. It doesn't need any sorting of rows as well, only that there are 2 rows per date (one for Clicked, and other for Delivered). Here is how the 2-3 line code below works -
- Set Day and Status as multiindex
- Unstack the Status index to get 2 columns,
Clicked
andDelivered
- Simply divide these 2 columns after that and reset index
s = df.set_index(['Day','Status'])['Count'].unstack() #<-- Unstack status index
s['Ratio'] = s['Clicked']/s['Delivered'] #<-- Divide clicked & delivered
s = s.reset_index() #<-- Reset index
print(s)
Status Day Clicked Delivered Ratio
0 2022-11-08 1132 288322 0.003926
1 2022-11-13 4684 246203 0.019025
2 2022-11-19 1222 154872 0.007890
3 2022-11-20 2384 253716 0.009396
2. Using .groupby()
.unstack()
(when additional rows of each type exist)
Just to be on the safer side, if you do have multiple rows for "clicked" or "delivered" respectively for a given date (for example, 2 clicked rows and 4 delivered rows for the same date), you can sum the quantities to get a single clicked and a single delivered row, using groupby
and then do the same process as above.
- Group by Day and Status columns and take sum of counts
- Unstack the Status index to get 2 columns,
Clicked
andDelivered
- Simply divide these 2 columns after that and reset index
g = df.groupby(['Day','Status'])['Count'].sum().unstack() #<-- Unstack status index
g['Ratio'] = g['Clicked']/g['Delivered'] #<-- Divide clicked & delivered
g = g.reset_index() #<-- Reset index
print(g)
Status Day Clicked Delivered Ratio
0 2022-11-08 1132 288322 0.003926
1 2022-11-13 4684 246203 0.019025
2 2022-11-19 1222 154872 0.007890
3 2022-11-20 2384 253716 0.009396
CodePudding user response:
You may group by two columns at the same time.
As far as I understand you've already done smth like this:
df.groupby(df.Day)
Right?
You may do:
df.groupby([df.Day, df.Status])
df.groupby([df.Day, df.Status]).indices
:
{('2022-11-08', 'Clicked'): array([1], dtype=int64),
('2022-11-08', 'Delivered'): array([0], dtype=int64),
('2022-11-13', 'Clicked'): array([3], dtype=int64),
('2022-11-13', 'Delivered'): array([2], dtype=int64),
('2022-11-19', 'Clicked'): array([5], dtype=int64),
('2022-11-19', 'Delivered'): array([4], dtype=int64),
('2022-11-20', 'Clicked'): array([7], dtype=int64),
('2022-11-20', 'Delivered'): array([6], dtype=int64)}
CodePudding user response:
Here is one way to do it with GroupBy.agg
:
out = (
df.groupby(["Day", "Channel"],
as_index=False, sort=False)
["Count"].agg(["first", "last"])
.assign(result= lambda x: x.pop("last").div(x.pop("first")))
# .assign(result= lambda x: x["last"].div(x["first"])) #if you need to preserve clicked/delivered values
.reset_index()
)
# Output :
print(out)
Day Channel result
0 2022-11-08 SMS 0.003926
1 2022-11-13 SMS 0.019025
2 2022-11-19 SMS 0.007890
3 2022-11-20 SMS 0.009396
NB : You need to make sure that the rows are sorted with Delivered/Clicked.