Home > Software engineering >  Python Pandas - Shift grouped values to next date
Python Pandas - Shift grouped values to next date

Time:12-08

I'm trying to build a model for timeseries analysis and I want to create some features which contain values of previous rows.

I have a dataset as follows:

item_id shop_id items_sold date_block_id
1 1 5 0
1 2 10 0
1 1 7 1
2 2 8 1
2 1 5 1

And I want to add a column containing the amount of items sold for each item/shop combination, from the previous date block. So something like:

item_id shop_id items_sold date_block_id lag1_items_sold
1 1 5 0 0
1 2 10 0 0
1 1 7 1 5
2 2 8 1 0
2 1 5 1 0

I think what makes this hard is the fact that if no sale is made within a date block, no entry of that item/shop combination is present in the table. I have tried al sorts of combinations of grouping and shifting, but I can't seem to get the result I want.

CodePudding user response:

Look at df.shift combined with group by. Docs for shift here.

Older answer with code here. Try searching through SO next time and you might find the answer!

CodePudding user response:

Try:

df["lag1_items_sold"] = (
    df.groupby(["item_id", "shop_id"])["items_sold"].shift().fillna(0)
)

print(df)

Prints:

   item_id  shop_id  items_sold  date_block_id  lag1_items_sold
0        1        1           5              0              0.0
1        1        2          10              0              0.0
2        1        1           7              1              5.0
3        2        2           8              1              0.0
4        2        1           5              1              0.0
  • Related