Is it possible to specify a range of rows when using the pandas.DataFrame.shift
method or is there a similar approach I can use?
Example
df = pd.DataFrame(
{"Col1": [True, False, False, True, False],
"Col2": [0, 1, 2, 3, 4]},
index=pd.date_range("2020-01-01", "2020-01-05"))
If I want to query current row and previous row I can use the following query:
df.query("(Col1 == True) & (Col2 == 2) | (Col1 == True) & (Col2.shift(1) == 2)")
What if I also want to see if Col2.shift(2)
and Col2.shift(3)
etc ==2
for current row?
Do need to add more conditions like | (Col1 == True) & (Col2.shift(2) == 2
or is there a way to specify a range for shift
Preference is to use query
but open to other ways.
CodePudding user response:
You could use rolling.max
.
First, for the boolean condition:
(A&B) | (A&C) | ...
is equivalent to
((A&B) | A) & ((A&B)|C)
is equivalent to
A & (B|C)
You can inductively prove that this holds for any number of conditions, so you can take the condition for "Col1" out of the nesting and chain the "Col2" conditions with |
and do a single &
with "Col1", like:
df['Col1'] & ( df['Col2'].eq(2) | df['Col2'].shift(1).eq(2) )
Then since you want True to be returned even if a single day has value equal to 2 in the rolling number of days, max
should suffice.
rng = 2
out = df[df['Col2'].eq(2).rolling(rng).max().fillna(0).astype(bool) & df['Col1']]
Output:
Col1 Col2
2020-01-04 True 3