Home > Software design >  Column values based on month and year of 'date' column
Column values based on month and year of 'date' column

Time:07-17

I am working on a dataframe having above 25000 rows. The data spans between 2012 to 2021. In this I have a column date. I want to create another column called assessmentYear(AY), Where year, and month from a date column should be used to create values. Each assessment year starts from October of a given year, and closes at May of next year. I have used pandas.dt.year, and pandas.dt.month_name for retrieving these values to separate columns. However, I am not able to store values in AY column (AY1, AY2, AY3..) based on the assessment period.

I am sharing sample of Date column as dictionary in the below snippet.

{
 1: '2019-09-19',
 2: '2019-09-20',
 3: '2019-10-29',
 4: '2019-10-30',
 5: '2020-04-01',
 6: '2020-04-02',
 7: '2020-04-03',
 8: '2020-04-04',
 9: '2020-11-05',
 10: '2020-11-06',
 11: '2020-11-07',
 12: '2020-11-08',
 13: '2020-11-09',
 14: '2021-04-10',
 15: '2021-04-11',
 16: '2021-04-12',
 }

Upto row 2 in the above dictionary should be assigned a value AY1, then upto it is AY2, and remaining rows should be assigned a value AY3 in the column AY. Instead of relying on index, I am looking to implement this based on date column, as the dates are varying dynamically in real dataset. I would appreciate any help on how to proceed with this problem.

CodePudding user response:

Use to_period to convert as quarter and qyear to a fiscal year:

fyear = pd.to_datetime(df['Date']).dt.to_period('Q-SEP').dt.qyear
df['AY'] = 'AY'   df.groupby(fyear).ngroup().add(1).astype(str)
print(df)

# Output
          Date   AY
1   2019-09-19  AY1
2   2019-09-20  AY1
3   2019-10-29  AY2
4   2019-10-30  AY2
5   2020-04-01  AY2
6   2020-04-02  AY2
7   2020-04-03  AY2
8   2020-04-04  AY2
9   2020-11-05  AY3
10  2020-11-06  AY3
11  2020-11-07  AY3
12  2020-11-08  AY3
13  2020-11-09  AY3
14  2021-04-10  AY3
15  2021-04-11  AY3
16  2021-04-12  AY3
  • Related