Home > Mobile >  Comparing previous rows in two columns of a DataFrame
Comparing previous rows in two columns of a DataFrame

Time:11-27

I have a dataframe of transactions with the unique ID of a product, seller and buyer. I want to keep a record whether someone has bought a product and later resold it. Here's a simplified view of my dataset:

       prod_id seller buyer
0       cc_123      x     y
1       cc_111      d     y
2       cc_025      y     x
3       cc_806      d     m
4       cc_963      a     b
5       cc_235      o     h
6       cc_806      m     t
7       cc_555      z     w
8       cc_444      s     q

My initial idea was to group by id and compare consecutive rows in the grouped dataframe, by checking if the seller of the current row is the same person as the buyer of the previous row for a given product, e.g., transaction 6 is a resale because the same person "m" previously bought and is now selling product cc_806:

       prod_id seller buyer
3       cc_806      d     m
6       cc_806      m     t

So the final dataset would look like this:

       prod_id seller buyer    resale
0       cc_123      x     y         0
1       cc_111      d     y         0
2       cc_025      y     x         0
3       cc_806      d     m         0
4       cc_963      a     b         0
5       cc_235      o     h         0
6       cc_806      m     t         1
7       cc_555      z     w         0
8       cc_444      s     q         0

Where 1 means yes/true and 0 means no/false. My attempt is not working:

df['resale'] = df.groupby('prod_id')['seller'] == df.groupby('prod_id')['buyer'].shift(1)

Is there an efficient solution for this?

CodePudding user response:

Your attempt almost got there. For sellers, it has no need to groupby.

df['resale'] = df.groupby('prod_id')['buyer'].shift(1) == df['seller']
df['resale'] = df['resale'].astype(int)
df

output:

  prod_id seller buyer  resale
0  cc_123      x     y       0
1  cc_111      d     y       0
2  cc_025      y     x       0
3  cc_806      d     m       0
4  cc_963      a     b       0
5  cc_235      o     h       0
6  cc_806      m     t       1
7  cc_555      z     w       0
8  cc_444      s     q       0

CodePudding user response:

use this

df["resale"]=df.apply(lambda x: len(df[(df["prod_id"]==x["prod_id"])& (df["buyer"]==x["seller"])]),axis=1)

output

  prod_id seller buyer  resale
0  cc_123      x     y       0
1  cc_111      d     y       0
2  cc_025      y     x       0
3  cc_806      d     m       0
4  cc_963      a     b       0
5  cc_235      o     h       0
6  cc_806      m     t       1
7  cc_555      z     w       0
8  cc_444      s     q       0

  • Related