I tried to filter the dataframe by using a date range, where I have the initial start_date
and the end_date
is x days after the start_date
. Basically, what I want is equal to the WHERE date
BETWEEN start_date
AND DATE_ADD(start_date, INTERVAL x DAYS) AS end_date
Clause in SQL.
Here is an example of my dataframe
----------- -----------
| date | aggregate |
----------- -----------
| ... | ... |
|2022-08-31 | 42 |
|2022-09-01 | 30 |
|2022-09-02 | 65 |
|2022-09-03 | 55 |
| ... | ... |
----------- -----------
So, I tried this on python
import pandas as pd
from datetime import datetime, timedelta
start_date = datetime.strptime("2022-08-31", "%Y-%m-%d")
end_date = start_date timedelta(days=3) # let say I want to have 3 days range
df_filtered = df[(df['date'] >= start_date ) & (df['date'] < end_date ]
But, it raised UserWarning: Boolean Series key will be reindexed to match DataFrame index.
and yielded a dataframe with missing several dates.
CodePudding user response:
how about set the date column as the index then filter:
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame([
['2022-08-31',42],
['2022-09-01',30],
['2022-09-02',65],
['2022-09-03',55],
],columns=['date','aggregate'])
df.date=pd.to_datetime(df['date'])
df.set_index('date',inplace=True)
start_date = datetime.strptime("2022-08-31", "%Y-%m-%d")
end_date = start_date timedelta(days=3) # let say I want to have 3 days range
df[(df.index >= start_date ) & (df.index < end_date)]
aggregate
date
2022-08-31 42
2022-09-01 30
2022-09-02 65