Home > database >  Function returns only one season from my do_season_on_date method
Function returns only one season from my do_season_on_date method

Time:09-02

I have a method which I want to map a date column to a new column season but it only maps spring. I have defined the season using a dictionary where keys are season names and values as date ranges. I do not know why it is only returning only one season as I have defined the dates for each season. Here is the code for the function

def do_season_on_date(date):
    year = str(date.year)
    seasons = {'spring': pd.date_range(start='01/09/' year, end='30/11/' year),
               'summer': pd.date_range(start='01/12/' year, end='28/02/' year),
               'autumn': pd.date_range(start='01/03/' year, end='31/05/' year)}
    if date in seasons['spring']:
      return 'spring'
    elif date in seasons['summer']:
      return 'summer'
    elif date in seasons['autumn']:
      return 'autumn'
    else:
     return 'winter'

Here is the output

 date       ndvi        seasons
2000-02-29  0.331070    spring
2000-03-31  0.326608    spring
2000-04-30  0.300348    spring
2000-05-31  0.251368    spring
2000-06-30  0.216910    spring
2020-07-31  0.205169    spring
2020-08-31  0.198418    spring
2020-09-30  0.192516    spring
2020-10-31  0.201836    spring
2020-11-30  0.210474    spring

This how I map date to seasons

df_monthly['seasons'] = df_monthly.date.map(do_season_on_date)

CodePudding user response:

Your strings are not being read correctly by the pandas class method, it usually likes ISO format besides that nice map, dude. Just noticed barmar actually answered it in the comments, sorry didnt notice it

def do_season_on_date(date):
        year = str(date.year)
        if year % 400 == 0 or year % 4 == 0  and year % 100:
            #Different definitions for your date range

        seasons = {'spring': pd.date_range(start = f'{year}-09-01', end = f'{year}-11-30'),
                   'summer': pd.date_range(start = f'{year}-12-01', end = f'{year}-02-28'),
                   'autumn': pd.date_range(start = f'{year}-01-03', end = f'{year}-05-31')}
        if date in seasons['spring']:
          return 'spring'
        elif date in seasons['summer']:
          return 'summer'
        elif date in seasons['autumn']:
          return 'autumn'
        else:
         return 'winter'

CodePudding user response:

  1. When I tried your code, I got:

UserWarning: Parsing '30/11/2019' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing. exec(code_obj, self.user_global_ns, self.user_ns)

It means that DD/MM/YYYY is not a standard format.

  1. date in pd.date_range(start=..., end=...) may not do what you expect.

pd.date_range creates a DatetimeIndex of timestamps, each seperated by 1 day by default. So the in only checks if your date exactly equals one of these. If your date is anywhere between them, the result will be False.

  1. Also, looping is not very efficient with pandas.

Here is what you can do:

According to this answer the simplest way to filter for dates only between month and day, independent of the YEAR is by checking month and day individually (like df['Date'].dt.month == 11).

Here is the version that I would use:

Note: it uses the dates provided in the question, which don't actually correspond to reality.

# The year here will be ignored - but it has to be set. So just choose something.
seasons = {'spring': pd.to_datetime(['01/09/22', '30/11/22'], format = '%d/%m/%y'),
            'summer': pd.to_datetime(['01/12/22', '29/02/20'], format = '%d/%m/%y'),  # choose the leap year for February
            'autumn': pd.to_datetime(['01/03/22', '31/05/22'], format = '%d/%m/%y')}

condlist = [df['date'].dt.month.between(start.month, end.month) &
            df['date'].dt.day.between(start.day, end.day)
            for (start, end) in seasons.values()]
choicelist = list(seasons.keys())

df['seasons'] = np.select(condlist, choicelist, default=np.nan)

Result:

        date      ndvi seasons
0 2000-02-29  0.331070     nan
1 2000-03-31  0.326608  autumn
2 2000-04-30  0.300348  autumn
3 2000-05-31  0.251368  autumn
4 2000-06-30  0.216910     nan
5 2020-07-31  0.205169     nan
6 2020-08-31  0.198418     nan
7 2020-09-30  0.192516  spring
8 2020-10-31  0.201836     nan
9 2020-11-30  0.210474  spring

Note: as I said before, the date ranges don't correspond to reality. So don't be surprised the seasons are wrong ;)

  • Related