I have this dataframe:
retailer_id true_channel
28 96 organic
29 96 organic
30 96 retailer
63 174 organic
64 174 organic
65 174 retailer
66 174 retailer
67 174 organic
155 325 retailer
156 325 retailer
I want it to look like this
retailer_id true_channel funnel
28 96 organic organic
29 96 organic organic
30 96 retailer then retailer
63 174 organic organic
64 174 organic organic
65 174 retailer then retailer
66 174 retailer then retailer
67 174 organic then organic
155 325 retailer retailer
156 325 retailer retailer
and so on...
I tried creating this code:
retailers = visits_orders['retailer_id'].unique()
output = []
for retailer in retailers:
this_retailer_df = visits_orders.loc[visits_orders['retailer_id']==retailer,'true_channel']
for i,channel in enumerate(this_retailer_df[:-1]):
if i == 0:
output.append(channel)
else:
if channel == this_retailer_df.iloc[i 1]:
output.append(channel)
else:
output.append(f"Then {channel}")
visits_orders['funnel_try1'] = output
but off course due to poor structure, It produced a lot of NaNs and slow performance. also, it did not capture the logic that well
CodePudding user response:
You can use a groupby.apply
with a shift
and cumsum
to form a mask
:
# create groups of successive identical true_channel
# per retailer_id and enumerate them
order = (df.groupby('retailer_id', group_keys=False)['true_channel']
.apply(lambda s: s.ne(s.shift()).cumsum())
)
# each enumeration that is > 1 is not the first
# let's mask it with the value prepended with "then_"
df['funnel'] = df['true_channel'].mask(order.gt(1), 'then_' df['true_channel'])
Output:
retailer_id true_channel funnel
28 96 organic organic
29 96 organic organic
30 96 retailer then_retailer
63 174 organic organic
64 174 organic organic
65 174 retailer then_retailer
66 174 retailer then_retailer
67 174 organic then_organic
155 325 retailer retailer
156 325 retailer retailer