I have this dataframe with a "date" column in it. I want to count the rows where the date is in the past 7 days. What's the best way to do this? I feel like using an If and a counter isn't very pandas-esque.
Also, I'm importing the data from a SQL db. Should I just load it already filtered with a query? What's the most efficient way?
CodePudding user response:
My suggestion:
- First, calculate the interval datetimes: today and past 7 days.
import datetime
today = datetime.date.today()
past7 = today - datetime.timedelta(days=7)
- Use them to filter your dataframe:
df_filtered = df[(df['date'] >= past7) & (df['date'] <= today)]
- Get the df_filtered length:
print(len(df_filtered))
CodePudding user response:
Consider your dataframe is something like that:
df = pd.DataFrame({'date': ['2021-12-03', '2021-12-02', '2021-12-01', '2021-11-30'], 'data': [1, 2, 3, 4]})
date data
0 2021-12-03 1
1 2021-12-02 2
2 2021-12-01 3
3 2021-11-30 4
if you want to filter the data between dates 2021-11-30 and 2021-12-02, you can use the following command:
df_filtered = df.set_index('date').loc['2021-12-02':'2021-11-30'].reset_index()
date data
0 2021-12-02 2
1 2021-12-01 3
2 2021-11-30 4
In the first step, you set the date to the index and after that use .loc
method to filter your desired dates. In the final step, you can count the number of rows by using the len(df_filtered)
CodePudding user response:
try:
len(df[df['date'] > datetime.datetime.now() - pd.to_timedelta("7day")])