I have a df with a significant amount of values. I would like to select all values occured the last 5 weeks of my set. I have tried with timedelta. I am aware that my code is not running but it is just to give you an idea at what direction I am aiming. I also tried with nlargest but it only gives me 5 values of the prior week.
gdf_hpo['weeknumber']=gdf_hpo[DATE].dt.week
gdf_hpo_2 = gdf_hpo.copy(deep=True)
get_last_week=gdf_hpo_2['weeknumber'].max()
week_prior_5=gdf_hpo_2['weeknumber'].max() - timedelta(weeks=5)
Do you have an idea ? Thanks a lot
CodePudding user response:
Maybe you can try not weeks = 5
, but transform weeks into days, so it will be
days = 35
. I saw the similar question on the Stackoverflow, maybe it will help to solve your issue: Selecting Data from Last Week in Python.
Good luck!
CodePudding user response:
Just use slicing, should work. And as ijdnam_alim mentioned, use days.
week_prior_5=gdf_hpo_2['weeknumber'].max() - timedelta(days=35)
df_5weeks = gdf_hpo_2[(gdf_hpo_2.weeknumber <= gdf_hpo_2.weeknumber.max()) \
and (gdf_hpo_2.weeknumber > week_prior_5)]
Let me know if this doesn't work.
CodePudding user response:
I guess it could be not a good idea to use only week number, because it also depends on a year. I recommend you use this:
cut_off_date = gdf_hpo['DATE'].max() - timedelta(weeks=5)
cut_off_dataset = gdf_hpo[gdf_hpo['DATE']>=cut_off_date]
Also your mistake was that you tried to use week number (which is actually only a number) and timedelta. Timedelta works with dates. So you should had subtracted not a timedelta, but just a number 5. And then filter by week number. However, as I said, it's better to use dates, not only week numbers.