Home > Software engineering >  is there a way to perform logic on the following rows in pandas
is there a way to perform logic on the following rows in pandas

Time:01-24

I have this dataframe:

     retailer_id true_channel
28            96      organic
29            96      organic
30            96     retailer
63           174      organic
64           174      organic
65           174     retailer
66           174     retailer
67           174      organic
155          325     retailer
156          325     retailer

I want it to look like this

     retailer_id true_channel funnel
28            96      organic  organic
29            96      organic  organic
30            96     retailer  then retailer
63           174      organic  organic
64           174      organic  organic
65           174     retailer  then retailer
66           174     retailer  then retailer
67           174      organic  then organic
155          325     retailer  retailer
156          325     retailer  retailer

and so on...

I tried creating this code:

retailers = visits_orders['retailer_id'].unique()
output = []

for retailer in retailers:
    this_retailer_df = visits_orders.loc[visits_orders['retailer_id']==retailer,'true_channel']
    
    for i,channel in enumerate(this_retailer_df[:-1]):
        
        if i == 0:
                  output.append(channel)
        else:
            
            
            if channel == this_retailer_df.iloc[i   1]:
                       output.append(channel)
            else:
                       output.append(f"Then {channel}")

visits_orders['funnel_try1'] = output

but off course due to poor structure, It produced a lot of NaNs and slow performance. also, it did not capture the logic that well

CodePudding user response:

You can use a groupby.apply with a shift and cumsum to form a mask:

# create groups of successive identical true_channel
# per retailer_id and enumerate them
order = (df.groupby('retailer_id', group_keys=False)['true_channel']
           .apply(lambda s: s.ne(s.shift()).cumsum())
        )

# each enumeration that is > 1 is not the first
# let's mask it with the value prepended with "then_"
df['funnel'] = df['true_channel'].mask(order.gt(1), 'then_' df['true_channel'])

Output:

     retailer_id true_channel         funnel
28            96      organic        organic
29            96      organic        organic
30            96     retailer  then_retailer
63           174      organic        organic
64           174      organic        organic
65           174     retailer  then_retailer
66           174     retailer  then_retailer
67           174      organic   then_organic
155          325     retailer       retailer
156          325     retailer       retailer
  • Related