I am working on a dataframe having above 25000 rows. The data spans between 2012 to 2021. In this I have a column date. I want to create another column called assessmentYear(AY), Where year, and month from a date column should be used to create values. Each assessment year starts from October of a given year, and closes at May of next year. I have used pandas.dt.year, and pandas.dt.month_name for retrieving these values to separate columns. However, I am not able to store values in AY column (AY1, AY2, AY3..) based on the assessment period.
I am sharing sample of Date column as dictionary in the below snippet.
{
1: '2019-09-19',
2: '2019-09-20',
3: '2019-10-29',
4: '2019-10-30',
5: '2020-04-01',
6: '2020-04-02',
7: '2020-04-03',
8: '2020-04-04',
9: '2020-11-05',
10: '2020-11-06',
11: '2020-11-07',
12: '2020-11-08',
13: '2020-11-09',
14: '2021-04-10',
15: '2021-04-11',
16: '2021-04-12',
}
Upto row 2 in the above dictionary should be assigned a value AY1, then upto it is AY2, and remaining rows should be assigned a value AY3 in the column AY. Instead of relying on index, I am looking to implement this based on date column, as the dates are varying dynamically in real dataset. I would appreciate any help on how to proceed with this problem.
CodePudding user response:
Use to_period
to convert as quarter and qyear
to a fiscal year:
fyear = pd.to_datetime(df['Date']).dt.to_period('Q-SEP').dt.qyear
df['AY'] = 'AY' df.groupby(fyear).ngroup().add(1).astype(str)
print(df)
# Output
Date AY
1 2019-09-19 AY1
2 2019-09-20 AY1
3 2019-10-29 AY2
4 2019-10-30 AY2
5 2020-04-01 AY2
6 2020-04-02 AY2
7 2020-04-03 AY2
8 2020-04-04 AY2
9 2020-11-05 AY3
10 2020-11-06 AY3
11 2020-11-07 AY3
12 2020-11-08 AY3
13 2020-11-09 AY3
14 2021-04-10 AY3
15 2021-04-11 AY3
16 2021-04-12 AY3