I have a dataframe that looks like this:
Date | Apples | Bananas etc
2020-01-01 | 2 | 5
2020-02-01 | 12 | 44
2020-03-01 | 4 | 45
I want to create a grouping logic by date but the date must be transformed to the following:
If the Date is on or after February of the current year, then label the Date as the next respective Year, otherwise label as the current respective year. Example:
For Apples on 2020-01-01, it should be labelled as the year '2020' because it is prior to February of the current year. However, Bananas in 2020-03-01 would be labelled as '2021' because the date falls after February of the current year.
Date | Apples | Bananas etc
2020 | 2 | 5
2021 | 12 | 44
2021 | 4 | 45
How would this work the best?
CodePudding user response:
You could use np.where
to do a conditional date offset.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['2020-01-01', '2020-02-01', '2020-03-01'],
'Apples': [2, 12, 4],
'Bananas': [5, 44, 45]})
df['Date'] = pd.to_datetime(df['Date'])
# Add a year if the month is greater than 1
df['Date'] = np.where(df['Date'].dt.month>1, df['Date'] pd.offsets.DateOffset(years=1), df['Date'])
# If you want the actual dates
print(df.groupby('Date').sum())
Apples Bananas
Date
2020-01-01 2 5
2021-01-31 12 44
2021-03-01 4 45
# If you want grouped by year only
print(df.groupby(df['Date'].dt.year).sum())
Apples Bananas
Date
2020 2 5
2021 16 89