I’d like to filter on multiple conditions within a data frame. For instance, find dates where Open > 10
and Close < 50
within the next n days of a Open > 10
date.
import yfinance as yf
data = yf.download(‘spy’, start=‘1990-01-01’ end=‘2000-01-01’)
data[(data[‘Open’] > 10) & (data[‘Close’] < 50 in data[‘Date’] to data[‘Date’] 7)]
How can I filter a dataframe on correlated dates with a condition?
CodePudding user response:
If I understand correctly, you need prices greater than 10, but less than 50, after the seventh day of the start of the data? Use straight quotes and there should be a comma after start in the query itself.
import yfinance as yf
data = yf.download('spy', start='1990-01-01', end='2000-01-01')
df = data[(data['Open'] > 10) & (data['Close'] < 50) & (data.index > data.index[6])]
print(df)
CodePudding user response:
Here's How I would do this.
`data = yf.download('spy', start='1990-01-01', end='2000-01-01')`
Output:
print(data)
Open High Low Close Adj Close Volume
Date
1993-01-29 43.96875 43.96875 43.750000 43.937500 25.438084 1003200
1993-02-01 43.96875 44.25000 43.968750 44.250000 25.619017 480500
1993-02-02 44.21875 44.37500 44.125000 44.343750 25.673300 201300
1993-02-03 44.40625 44.84375 44.375000 44.812500 25.944689 529400
1993-02-04 44.96875 45.09375 44.468750 45.000000 26.053246 531500
... ... ... ... ... ...
1999-12-27 146.50000 146.78125 145.062500 146.281250 96.697540 2691000
1999-12-28 145.87500 146.50000 145.484375 146.187500 96.635559 4084500
1999-12-29 146.31250 146.81250 145.312500 146.812500 97.048737 3001000
1999-12-30 147.12500 147.56250 146.187500 146.640625 96.935066 3641300
1999-12-31 146.84375 147.50000 146.250000 146.875000 97.090034 3172700
[1749 rows x 6 columns]
Filter out the data to only get the rows where
'Open'
> 10 and'Close'
< 50df = data[(data['Open'] > 10) & (data['Close'] < 50)]
Output:
print(df)
Open High Low Close Adj Close Volume
Date
1993-01-29 43.968750 43.96875 43.750000 43.937500 25.438084 1003200
1993-02-01 43.968750 44.25000 43.968750 44.250000 25.619017 480500
1993-02-02 44.218750 44.37500 44.125000 44.343750 25.673300 201300
1993-02-03 44.406250 44.84375 44.375000 44.812500 25.944689 529400
1993-02-04 44.968750 45.09375 44.468750 45.000000 26.053246 531500
... ... ... ... ... ...
1995-03-17 49.437500 49.62500 49.406250 49.562500 30.364269 89900
1995-03-20 49.625000 49.62500 49.468750 49.562500 30.364269 91700
1995-03-21 49.562500 49.87500 49.359375 49.437500 30.287670 104400
1995-03-22 49.531250 49.53125 49.328125 49.484375 30.316399 74900
1995-03-23 49.421875 49.65625 49.359375 49.515625 30.335543 220500
[543 rows x 6 columns]
Create a column that shifts the dates, essentially creating a column that contains the following date in that fitlered dataframe
df = df.reset_index(drop=False)
df['Next_Date'] = df['Date'].shift(-1)
Output:
print(df)
Date Open High ... Adj Close Volume Next_Date
0 1993-01-29 43.968750 43.96875 ... 25.438084 1003200 1993-02-01
1 1993-02-01 43.968750 44.25000 ... 25.619017 480500 1993-02-02
2 1993-02-02 44.218750 44.37500 ... 25.673300 201300 1993-02-03
3 1993-02-03 44.406250 44.84375 ... 25.944689 529400 1993-02-04
4 1993-02-04 44.968750 45.09375 ... 26.053246 531500 1993-02-05
.. ... ... ... ... ... ... ...
538 1995-03-17 49.437500 49.62500 ... 30.364269 89900 1995-03-20
539 1995-03-20 49.625000 49.62500 ... 30.364269 91700 1995-03-21
540 1995-03-21 49.562500 49.87500 ... 30.287670 104400 1995-03-22
541 1995-03-22 49.531250 49.53125 ... 30.316399 74900 1995-03-23
542 1995-03-23 49.421875 49.65625 ... 30.335543 220500 NaT
[543 rows x 8 columns]
Get the difference in days for each row and the following date column we just created
df['Difference'] = (df['Next_Date'] - df['Date']).dt.days
Output:
print(df)
Date Open High ... Volume Next_Date Difference
0 1993-01-29 43.968750 43.96875 ... 1003200 1993-02-01 3.0
1 1993-02-01 43.968750 44.25000 ... 480500 1993-02-02 1.0
2 1993-02-02 44.218750 44.37500 ... 201300 1993-02-03 1.0
3 1993-02-03 44.406250 44.84375 ... 529400 1993-02-04 1.0
4 1993-02-04 44.968750 45.09375 ... 531500 1993-02-05 1.0
.. ... ... ... ... ... ... ...
538 1995-03-17 49.437500 49.62500 ... 89900 1995-03-20 3.0
539 1995-03-20 49.625000 49.62500 ... 91700 1995-03-21 1.0
540 1995-03-21 49.562500 49.87500 ... 104400 1995-03-22 1.0
541 1995-03-22 49.531250 49.53125 ... 74900 1995-03-23 1.0
542 1995-03-23 49.421875 49.65625 ... 220500 NaT NaN
[543 rows x 9 columns]
Filter on that difference of days by your
"n_days"
n_days = 2
df = df[df['Difference'] <= n_days]
Output:
print(df)
Date Open High ... Volume Next_Date Difference
1 1993-02-01 43.96875 44.250000 ... 480500 1993-02-02 1.0
2 1993-02-02 44.21875 44.375000 ... 201300 1993-02-03 1.0
3 1993-02-03 44.40625 44.843750 ... 529400 1993-02-04 1.0
4 1993-02-04 44.96875 45.093750 ... 531500 1993-02-05 1.0
6 1993-02-08 44.96875 45.125000 ... 596100 1993-02-09 1.0
.. ... ... ... ... ... ... ...
536 1995-03-15 49.50000 49.578125 ... 278500 1995-03-16 1.0
537 1995-03-16 49.43750 49.812500 ... 20400 1995-03-17 1.0
539 1995-03-20 49.62500 49.625000 ... 91700 1995-03-21 1.0
540 1995-03-21 49.56250 49.875000 ... 104400 1995-03-22 1.0
541 1995-03-22 49.53125 49.531250 ... 74900 1995-03-23 1.0
[430 rows x 9 columns]
Full Code:
import yfinance as yf
data = yf.download('spy', start='1990-01-01', end='2000-01-01')
n_days = 2
df = data[(data['Open'] > 10) & (data['Close'] < 50)]
df = df.reset_index(drop=False)
df['Next_Date'] = df['Date'].shift(-1)
df['Difference'] = (df['Next_Date'] - df['Date']).dt.days
df = df[df['Difference'] <= n_days]
CodePudding user response:
supposing you use pandas, try this, i used this code for only one condition, but most likely you can expand to use multiple conditions, put your condition logic in the condition variable
# let df be your dataframe
# true or false series over condition
condition = df['state'] == ''
# list of true false values
condition = condition.values
# df rows given condition
df = df.loc[condition]