Home > OS >  How do I convert date (YYYY-MM-DD) to Month-YY and groupby on some other column to get minimum and m
How do I convert date (YYYY-MM-DD) to Month-YY and groupby on some other column to get minimum and m

Time:11-29

I have created a data frame which has rolling quarter mapping using the code

abcd = pd.DataFrame()
abcd['Month'] = np.nan

abcd['Month'] = pd.date_range(start='2020-04-01', end='2022-04-01', freq = 'MS')

abcd['Time_1'] = np.arange(1, abcd.shape[0] 1)
abcd['Time_2'] = np.arange(0, abcd.shape[0])
abcd['Time_3'] = np.arange(-1, abcd.shape[0]-1)


db_nd_ad_unpivot = pd.melt(abcd, id_vars=['Month'], 
                     value_vars=['Time_1', 'Time_2', 'Time_3',], 
                     var_name='Time_name', value_name='Time')
abcd_map = db_nd_ad_unpivot[(db_nd_ad_unpivot['Time']>0)&(db_nd_ad_unpivot['Time']< abcd.shape[0] 1)]
abcd_map = abcd_map[['Month','Time']]

The output of the code looks like this:

Output of the rolling quarter code

Now, I have created an additional column name that gives me the name of the month and year in format Mon-YY using the code

abcd_map['Month'] = pd.to_datetime(abcd_map.Month)
# abcd_map['Month'] = abcd_map['Month'].astype(str)
abcd_map['Time_Period'] = abcd_map['Month'].apply(lambda x: x.strftime("%b'%y"))

Same Dataframe with month name added

Now I want to see for a specific time, what is the minimum and maximum in the month column. For eg. for time instance 17 Time Instance 17

,The simple groupby results as: Time Period 17 Aug'21-Sept'21

enter image description here

The desired output is Time Time_Period 17 Aug'21-Oct'21.

I think it is based on min and max of the column Month as by using the strftime function the column is getting converted in String/object type.

CodePudding user response:

Do this:

abcd_map['Month_'] = pd.to_datetime(abcd_map['Month']).dt.strftime('%Y-%m')
abcd_map['Time_Period'] = abcd_map['Month_'] = pd.to_datetime(abcd_map['Month']).dt.strftime('%Y-%m')
abcd_map['Time_Period'] = abcd_map['Month'].apply(lambda x: x.strftime("%b'%y"))
df = abcd_map.groupby(['Time']).agg(
    sum_col=('Time', np.sum),
    first_date=('Time_Period', np.min),
    last_date=('Time_Period', np.max)
).reset_index()

df['TimePeriod'] = df['first_date'] '-' df['last_date']
df = df.drop(['first_date','last_date'], axis = 1)
df

which returns


    Time  sum_col     TimePeriod
0      1        3  Apr'20-May'20
1      2        6  Jul'20-May'20
2      3        9  Aug'20-Jun'20
3      4       12  Aug'20-Sep'20
4      5       15  Aug'20-Sep'20
5      6       18  Nov'20-Sep'20
6      7       21  Dec'20-Oct'20
7      8       24  Dec'20-Nov'20
8      9       27  Dec'20-Jan'21
9     10       30  Feb'21-Mar'21
10    11       33  Apr'21-Mar'21
11    12       36  Apr'21-May'21
12    13       39  Apr'21-May'21
13    14       42  Jul'21-May'21
14    15       45  Aug'21-Jun'21
15    16       48  Aug'21-Sep'21
16    17       51  Aug'21-Sep'21
17    18       54  Nov'21-Sep'21
18    19       57  Dec'21-Oct'21
19    20       60  Dec'21-Nov'21
20    21       63  Dec'21-Jan'22
21    22       66  Feb'22-Mar'22
22    23       69  Apr'22-Mar'22
23    24       48  Apr'22-Mar'22
24    25       25  Apr'22-Apr'22

CodePudding user response:

How about converting to string after finding the min and max

New_df = abcd_map.groupby('Time')['Month'].agg(['min', 'max']).apply(lambda x: x.dt.strftime("%b'%y")).agg(' '.join, axis=1).reset_index()
  • Related