Currently I do it this way:
import pandas as pd
dt = pd.DataFrame({
'1st':[1,0,1,0,1],
'2nd':[2,1,2,1,2],
'3rd':[3,0,3,2,3],
'4th':[4,3,4,3,4],
'5th':[5,0,5,4,5],
'minute_traded':[6,5,6,5,6]
})
dt = dt[
(dt['1st'] < dt['2nd']) &
(dt['2nd'] < dt['3rd']) &
(dt['3rd'] < dt['4th']) &
(dt['4th'] < dt['5th']) &
(dt['5th'] < dt['minute_traded'])
]
print(dt)
Result:
1st 2nd 3rd 4th 5th minute_traded
0 1 2 3 4 5 6
2 1 2 3 4 5 6
3 0 1 2 3 4 5
4 1 2 3 4 5 6
Is there a more correct method for an analysis like this that always uses the same pattern and only changes the columns to be analyzed?
CodePudding user response:
You can take column-wise differences from left to right and see if all of them are less than 0 to determine the mask:
dt.loc[dt.diff(-1, axis="columns").iloc[:, :-1].lt(0).all(axis="columns")]
(.iloc[:, :-1]
is to drop the rightmost difference result which is NaNs since there's no right column to it.)
to get
1st 2nd 3rd 4th 5th minute_traded
0 1 2 3 4 5 6
2 1 2 3 4 5 6
3 0 1 2 3 4 5
4 1 2 3 4 5 6
CodePudding user response:
Using shift
to perform the comparison and all
to aggregate as single boolean for boolean indexing:
out = dt[dt.shift(axis=1).lt(dt).iloc[:, 1:].all(axis=1)]
Output:
1st 2nd 3rd 4th 5th minute_traded
0 1 2 3 4 5 6
2 1 2 3 4 5 6
3 0 1 2 3 4 5
4 1 2 3 4 5 6