Home > Software engineering >  Specify range of rows when using pandas.DataFrame.shift
Specify range of rows when using pandas.DataFrame.shift

Time:05-06

Is it possible to specify a range of rows when using the pandas.DataFrame.shift method or is there a similar approach I can use?

Example

df = pd.DataFrame(
    {"Col1": [True, False, False, True, False], 
     "Col2": [0, 1, 2, 3, 4]},
    index=pd.date_range("2020-01-01", "2020-01-05"))

If I want to query current row and previous row I can use the following query:

df.query("(Col1 == True) & (Col2 == 2) | (Col1 == True) & (Col2.shift(1) == 2)")

What if I also want to see if Col2.shift(2) and Col2.shift(3) etc ==2 for current row?

Do need to add more conditions like | (Col1 == True) & (Col2.shift(2) == 2 or is there a way to specify a range for shift

Preference is to use query but open to other ways.

CodePudding user response:

You could use rolling.max.

First, for the boolean condition:

(A&B) | (A&C) | ...

is equivalent to

((A&B) | A) & ((A&B)|C)

is equivalent to

A & (B|C)

You can inductively prove that this holds for any number of conditions, so you can take the condition for "Col1" out of the nesting and chain the "Col2" conditions with | and do a single & with "Col1", like:

df['Col1'] & ( df['Col2'].eq(2) | df['Col2'].shift(1).eq(2) )

Then since you want True to be returned even if a single day has value equal to 2 in the rolling number of days, max should suffice.

rng = 2
out = df[df['Col2'].eq(2).rolling(rng).max().fillna(0).astype(bool) & df['Col1']]

Output:

            Col1  Col2
2020-01-04  True     3
  • Related