Home > Enterprise >  How to build sequence of purchases for each ID?
How to build sequence of purchases for each ID?

Time:07-03

I want to create a dataframe that shows me the sequence of what users purchasing according to the sequence column. For example this is my current df:

user_id | sequence | product | price
1       | 1        | A       | 10
1       | 2        | C       | 15
1       | 3        | G       | 1
2       | 1        | B       | 20
2       | 2        | T       | 45
2       | 3        | A       | 10
...

I want to convert it to the following format:

user_id | source_product | target_product | cum_total_price
1       | A              | C              | 25
1       | C              | G              | 16
2       | B              | T              | 65
2       | T              | A              | 75
...

How can I achieve this?

CodePudding user response:

shift cumsum groupby.apply:

def seq(g):
    g['source_product'] = g['product']
    g['target_product'] = g['product'].shift(-1)
    g['price'] = g.price.cumsum().shift(-1)
    return g[['user_id', 'source_product', 'target_product', 'price']].iloc[:-1]

df.sort_values('sequence').groupby('user_id', group_keys=False).apply(seq)

#   user_id source_product target_product  price
#0        1              A              C   25.0
#1        1              C              G   26.0
#3        2              B              T   65.0
#4        2              T              A   75.0
  • Related