Iterate a function over individual rows until condition is met, then move on to the next row-CodePudding

I am working with the lifetimes library to build a customer lifetime value model. The library comes with a method called conditional_expected_number_of_purchases_up_to_time that allows you to predict purchases for each customer in your data set over a specified time period.

Here is the dataframe I am working with:

df = pd.DataFrame([[[email protected], 6.0, 112.0, 139.0], [[email protected], 11.0, 130.0, 130.0]], columns=['email', 'frequency', 'recency', 'T'])

Each row in the dataframe represents an individual customer. To predict the number of expected purchases for each customer over the next 4 periods, I would execute the following code:

t = 4
df['est_purchases'] = mbgf.conditional_expected_number_of_purchases_up_to_time(t, df['frequency'], df['recency'], df['T'])

What I would like to do now is, for each row in the dataframe, approximate the total number of remaining purchases over the rest of their lifetime. Let's call this quantity Residual Customer Purchases (RCP).

To do this, I have defined two functions: the first calculates the incremental RCP between two time periods and the second function approximates the total RCP by incrementally increasing t until the incremental RCP falls below a specific tolerance level:

## Function to calculate incremental RCP
    def RCP(row):
        dif = (mbgf.conditional_expected_number_of_purchases_up_to_time(t, 
              row['frequency'], row['recency'], row['T'])
            - mbgf.conditional_expected_number_of_purchases_up_to_time((t-1), 
              row['frequency'], row['recency'], row['T']))
        return dif
    
## Create column for incremental RCP
    df['m_RCP'] = df.apply(RCP, axis = 1)

   ## Function to approximate total RCP 
    def approximate(fn, model, rfm, t=1, eps_tol=1e-6, eps=0, **kwargs):
        eps = 0
        cf = 0
        while True:
            cf  = df.apply(fn, axis = 1)
            if(cf - eps < eps_tol):
                break
            eps = cf; t =1
       return cf
    
## Create column for total RCP
    df['t_RCP'] = df.apply(approximate(RCP, model = mbgf, rfm = df), axis = 1)

The first function is working as expected. But when I try to execute the second function (approximate) I get this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I want the approximate function to iterate the RCP function for a single row until the RCP value no longer increases, and do this one by one for each row in the dataframe.

What am I doing wrong and what should I be doing instead?

CodePudding user response：

You are calling df.apply(fn, axis=1) which returns a series that you assign to cf. Then you're comparing the series cf - eps to a constant, which returns an array of booleans. An array of booleans is ambiguous to use in a conditional expression which is what causes the error.

What I would do is define a function iterated_RCP(row) that takes as input a row of the dataframe and iterates RCP on that row until it converges. Then you can do something like df.assign(t_RCP=df.apply(iterated_RCP, axis=1)).