Home > Blockchain >  Calculate the period between payments, by customer using Pandas
Calculate the period between payments, by customer using Pandas

Time:10-26

I have a DataFrame:

import pandas as pd

df = pd.DataFrame({'Customer': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B','B','B','B','B', 'B'],
                   'Date': ['1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021','8/1/2021', '1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021', '8/1/2021'], 
                   'Amt': [0, 10, 10, 10, 0, 0, 0, 0, 0, 0, 10, 10, 0, 0, 10, 0]})

df

    Customer    Date    Amt
   0    A   1/1/2021    0
   1    A   2/1/2021    10
   2    A   3/1/2021    10
   3    A   4/1/2021    10
   4    A   5/1/2021    0
   5    A   6/1/2021    0
   6    A   7/1/2021    0
   7    A   8/1/2021    0
   8    B   1/1/2021    0
   9    B   2/1/2021    0
  10    B   3/1/2021    10
  11    B   4/1/2021    10
  12    B   5/1/2021    0
  13    B   6/1/2021    0
  14    B   7/1/2021    10
  15    B   8/1/2021    0

What I'm trying to calculate is the amount of time between purchases, when there's been a gap. So, for customer A, there is no gap...they just churned and have not come back. Moreover, for Customer B, they enter at 3/1/21 (i.e., first purchase), and would have a two-month gap for 5/1-6/1.

How would I go about calculating this by customer?

edit: Desired output: by customer, gap (i.e., number of periods, in this case months) enter image description here

CodePudding user response:

We can do groupby with ffill and bfill

s = df.Amt.mask(df.Amt==0)
g = s.groupby(df['Customer'])
checker = g.ffill().notna() & g.bfill().notna() & s.isna()
checker.groupby(df['Customer']).sum()
Out[244]: 
Customer
A    0
B    2
Name: Amt, dtype: int64

Update

x = checker.groupby([df['Customer'],checker.ne(True).cumsum()]).sum()
x[x.ne(0).idxmax()]
2
x[x.ne(0).groupby(level=0).idxmax()]
Customer  Amt
A         1      0
B         12     2
Name: Amt, dtype: int64

CodePudding user response:

see if below example can help:

def diffcalc (amt):
last = 0
diff = 0
for i in range (0, len(amt), 1):
    if amt[i] > 0:
        diff = (i - last-1)
        last = i
return diff

amt_a = [0, 10, 10, 10, 0, 0, 0, 0]
amt_b = [0, 10, 10, 0, 0, 10, 0, 0]
print('a', diffcalc(amt_a))
print('b', diffcalc(amt_b))
  • Related