I have a method which I want to map a date column to a new column season but it only maps spring. I have defined the season using a dictionary where keys are season names and values as date ranges. I do not know why it is only returning only one season as I have defined the dates for each season. Here is the code for the function
def do_season_on_date(date):
year = str(date.year)
seasons = {'spring': pd.date_range(start='01/09/' year, end='30/11/' year),
'summer': pd.date_range(start='01/12/' year, end='28/02/' year),
'autumn': pd.date_range(start='01/03/' year, end='31/05/' year)}
if date in seasons['spring']:
return 'spring'
elif date in seasons['summer']:
return 'summer'
elif date in seasons['autumn']:
return 'autumn'
else:
return 'winter'
Here is the output
date ndvi seasons
2000-02-29 0.331070 spring
2000-03-31 0.326608 spring
2000-04-30 0.300348 spring
2000-05-31 0.251368 spring
2000-06-30 0.216910 spring
2020-07-31 0.205169 spring
2020-08-31 0.198418 spring
2020-09-30 0.192516 spring
2020-10-31 0.201836 spring
2020-11-30 0.210474 spring
This how I map date to seasons
df_monthly['seasons'] = df_monthly.date.map(do_season_on_date)
CodePudding user response:
Your strings are not being read correctly by the pandas class method, it usually likes ISO format besides that nice map, dude. Just noticed barmar actually answered it in the comments, sorry didnt notice it
def do_season_on_date(date):
year = str(date.year)
if year % 400 == 0 or year % 4 == 0 and year % 100:
#Different definitions for your date range
seasons = {'spring': pd.date_range(start = f'{year}-09-01', end = f'{year}-11-30'),
'summer': pd.date_range(start = f'{year}-12-01', end = f'{year}-02-28'),
'autumn': pd.date_range(start = f'{year}-01-03', end = f'{year}-05-31')}
if date in seasons['spring']:
return 'spring'
elif date in seasons['summer']:
return 'summer'
elif date in seasons['autumn']:
return 'autumn'
else:
return 'winter'
CodePudding user response:
- When I tried your code, I got:
UserWarning: Parsing '30/11/2019' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing. exec(code_obj, self.user_global_ns, self.user_ns)
It means that DD/MM/YYYY
is not a standard format.
date in pd.date_range(start=..., end=...)
may not do what you expect.
pd.date_range
creates a DatetimeIndex
of timestamps, each seperated by 1 day by default.
So the in
only checks if your date exactly equals one of these. If your date is anywhere between them, the result will be False
.
- Also, looping is not very efficient with pandas.
Here is what you can do:
According to this answer
the simplest way to filter for dates only between month and day, independent of the YEAR
is by checking month and day individually (like df['Date'].dt.month == 11
).
Here is the version that I would use:
Note: it uses the dates provided in the question, which don't actually correspond to reality.
# The year here will be ignored - but it has to be set. So just choose something.
seasons = {'spring': pd.to_datetime(['01/09/22', '30/11/22'], format = '%d/%m/%y'),
'summer': pd.to_datetime(['01/12/22', '29/02/20'], format = '%d/%m/%y'), # choose the leap year for February
'autumn': pd.to_datetime(['01/03/22', '31/05/22'], format = '%d/%m/%y')}
condlist = [df['date'].dt.month.between(start.month, end.month) &
df['date'].dt.day.between(start.day, end.day)
for (start, end) in seasons.values()]
choicelist = list(seasons.keys())
df['seasons'] = np.select(condlist, choicelist, default=np.nan)
Result:
date ndvi seasons
0 2000-02-29 0.331070 nan
1 2000-03-31 0.326608 autumn
2 2000-04-30 0.300348 autumn
3 2000-05-31 0.251368 autumn
4 2000-06-30 0.216910 nan
5 2020-07-31 0.205169 nan
6 2020-08-31 0.198418 nan
7 2020-09-30 0.192516 spring
8 2020-10-31 0.201836 nan
9 2020-11-30 0.210474 spring
Note: as I said before, the date ranges don't correspond to reality. So don't be surprised the seasons are wrong ;)