Home > database >  Pandas compare next row (several rows) by condition
Pandas compare next row (several rows) by condition

Time:07-03

I have a dataframe with two columns: price and pattern (can be 0 if absent or 1 if exists).

Price  Pattern
   10        0
   12        1
   15        0
   11        0
    9        0

First, i need to iterate to find row with existing pattern (pattern = 1), then

  1. compare price in current row (12) with price in next row (15). Basically i need to know how price changed (15 - 12) and put result in new column "diff_1_tf" so later i can use all of these values to see average / overall picture.
  2. compare price in current row (12) with price in third row from current row (9 - 12) and put result in new column "diff_3_tf".

I know that shift can be usefull but i just cant understand how to make it work in my case. I'm stuck here. Please help.

new_df = df[['price', "pattern"]].copy()
for row in new_df["pattern"]:    
    if row == 1:
        print(row)

Update: finally i solved my problem with new_df.iterrows() and index manipulations

CodePudding user response:

Could you post your dataframe with it's values, that would help.

Additionally, do you mean:

  1. compare price in row 1 with row 2 and place the bigger value in the new column.

  2. compare price in row 1 with row 3 and place the bigger value in the new column.

Some clarification would be useful.

CodePudding user response:

I guess that you can use pandas built-in function diff, which already has a magical parameter period and it finds a difference between i-th and 'i period-th` value. I extended your example, to verify if this is what you are looking for:

    Price  Pattern
0      10        0
1      12        1
2      15        0
3      11        0
4       9        0
5       8        1
6       5        0
7       4        1
8       7        0
9       9        0
10     11        0
11      2        1

Price change column:

df_new['price_change_%d' % period] = df.diff(periods=period).set_index('Pattern').loc[-1].reset_index(drop=True)

Outputs for different period values:

>> period = 1
0    3.0
1   -3.0
2    3.0
Name: Price, dtype: float64
 
>> period = 2
0   -1.0
1    5.0
Name: Price, dtype: float64
 
>> period = 3
0   -3.0
1   -1.0
2    7.0
Name: Price, dtype: float64

Also, for period == 2 note that it will output only 2 pairs of values because 6th and 8th positions both contain pattern 1. Not sure about those edge cases, and how would you prevent/post-process them?

  • Related