Home > Blockchain >  Pandas groupby cumulative sum start from 0
Pandas groupby cumulative sum start from 0

Time:11-17

I have a data-frame like the one below. I would like to add column that is cumulative sum of no-show in appointment(sum of previous no-shows for each person). for each person in the new column that is called (previous-missed-appointments) , it should start from 0

name day show-in-appointment
0 Jack 2020/01/01 show
1 Jack 2020/01/02 no-show
2 Jill 2020/01/02 no-show
3 Jack 2020/01/03 show
4 Jill 2020/01/03 show
5 Jill 2020/01/04 no-show
6 Jack 2020/01/04 show
7 Jill 2020/01/05 show
8 jack 2020/01/06 no-show
9 jack 2020/01/07 show

   name        day   show-in-appointment     previous-missed-appointments
0  Jack   2020/01/01   show                              0
1  Jack   2020/01/02   no-show                           0
2  Jill   2020/01/02   no-show                           0 
3  Jack   2020/01/03   show                              1
4  Jill   2020/01/03   show                              1
5  Jill   2020/01/04   no-show                           1
6  Jack   2020/01/04   show                              1
7  Jill   2020/01/05   show                              2
8  jack   2020/01/06   no-show                           1
9  jack   2020/01/07   show                              2

I tried various combos of df.groupby and df.agg(lambda x: cumsum(x)) to no avail.

CodePudding user response:

import pandas as pd

df.name = df.name.str.capitalize()
df['order'] = df.index
df.day = pd.to_datetime(df.day)
df['noshow'] = df['show-in-appointment'].map({'show': 0, 'no-show': 1})
df = df.sort_values(by=['name', 'day'])
df['previous-missed-appointments'] = df.groupby('name').noshow.cumsum()
df.loc[df.noshow == 1, 'previous-missed-appointments'] -= 1
df = df.sort_values(by='order')
df = df.drop(columns=['noshow', 'order'])

CodePudding user response:

I think the two main methods you can use are groupby and cumsum

Have a look at the code below:

df.sort_values(by=['name', 'date'], inplace=True, ignore_index=True)
df['check'] = np.where(df['show-in-appointment']=='no-show', 1.0, 0.0)
df['previous-miss'] = df.groupby('name')['check'].cumsum()
  • Related