Home > Blockchain >  How to group data by customized date logic?
How to group data by customized date logic?

Time:10-14

I have a dataframe that looks like this:

Date       | Apples | Bananas etc
2020-01-01 | 2      | 5
2020-02-01 | 12     | 44
2020-03-01 | 4      | 45

I want to create a grouping logic by date but the date must be transformed to the following:

If the Date is on or after February of the current year, then label the Date as the next respective Year, otherwise label as the current respective year. Example:

For Apples on 2020-01-01, it should be labelled as the year '2020' because it is prior to February of the current year. However, Bananas in 2020-03-01 would be labelled as '2021' because the date falls after February of the current year.

Date | Apples | Bananas etc
2020 | 2      | 5
2021 | 12     | 44
2021 | 4      | 45

How would this work the best?

CodePudding user response:

You could use np.where to do a conditional date offset.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date': ['2020-01-01', '2020-02-01', '2020-03-01'],
 'Apples': [2, 12, 4],
 'Bananas': [5, 44, 45]})

df['Date'] = pd.to_datetime(df['Date'])
# Add a year if the month is greater than 1
df['Date'] = np.where(df['Date'].dt.month>1, df['Date']  pd.offsets.DateOffset(years=1), df['Date'])



# If you want the actual dates
print(df.groupby('Date').sum())

            Apples  Bananas
Date                       
2020-01-01       2        5
2021-01-31      12       44
2021-03-01       4       45

# If you want grouped by year only
print(df.groupby(df['Date'].dt.year).sum())

      Apples  Bananas
Date                 
2020       2        5
2021      16       89
  • Related