Home > Back-end >  How to combine multiple mm-dd-yy values into a year_month column? (python)
How to combine multiple mm-dd-yy values into a year_month column? (python)

Time:06-16

I'm learning python so sorry if this is a basic question, but I couldn't find the specific answer I was looking for from questions posted here previously.

I have the dataframe below, which lists grocery items, their prices, and the dates purchased. I want to create a new column called year_month. So basically 03-22-2022 would turn into 03-2022 in this new column. Not exactly sure how to go about doing this of creating a new column and also grouping(?) the multiple dates by month. My end goal is to make a graph showing the change in grocery prices by month. For reference, my date column is currently in datetime format. Thanks in advance for the help!

What my dataframe currently looks like:

Item    | Price| Date
Apples  | 2.99 | 03-22-2022
Chicken | 8.99 | 4-01-2022
Bananas | 2.59 | 04-25-2022
Rice    | 5.69 | 5-06-2022

What I want it to look like:

Item    | Price| Date       | Year_Month
Apples  | 2.99 | 03-22-2022 | 03-2022
Chicken | 8.99 | 4-01-2022  | 04-2022
Bananas | 2.59 | 04-25-2022 | 04-2022
Rice    | 5.69 | 5-06-2022  | 05-2022

CodePudding user response:

One way using pandas.to_datetime with strftime:

df["Year_Month"] = pd.to_datetime(df["Date"]).dt.strftime("%m-%Y")
print(df)

Output:

      Item  Price        Date Year_Month
0   Apples   2.99  03-22-2022    03-2022
1  Chicken   8.99   4-01-2022    04-2022
2  Bananas   2.59  04-25-2022    04-2022
3     Rice   5.69   5-06-2022    05-2022

CodePudding user response:

To do what you're asking it probably makes sense to sort by Date as well as use Series.dt.strftime() to create the new Year_Month column:

import pandas as pd
df = pd.DataFrame({'Item':['Apples','Chicken','Bananas','Rice'],'Price':[2.99,8.99,2.59,5.69],'Date':['03-22-2023','4-01-2022','04-25-2022','5-06-2022']})
df.Date = df.Date.astype('datetime64')
print(df)
df = df.sort_values('Date')
df['Year_Month'] = df.Date.dt.strftime('%m-%Y')
print(df)

Input:

      Item  Price       Date
0   Apples   2.99 2023-03-22
1  Chicken   8.99 2022-04-01
2  Bananas   2.59 2022-04-25
3     Rice   5.69 2022-05-06

Output:

      Item  Price       Date Year_Month
1  Chicken   8.99 2022-04-01    04-2022
2  Bananas   2.59 2022-04-25    04-2022
3     Rice   5.69 2022-05-06    05-2022
0   Apples   2.99 2023-03-22    03-2023

Note that sort_values() ensures that the row with the 2023 date in the above example becomes the last row in the result.

CodePudding user response:

df['YearMOnth'] = df['Date'].astype('datetime64').dt.strftime('%m-%Y')
df

    Item    Price   Date        YearMOnth
0   Apples  2.99    03-22-2022  03-2022
1   Chicken 8.99    4-01-2022   04-2022
2   Bananas 2.59    04-25-2022  04-2022
3   Rice    5.69    5-06-2022   05-2022
  • Related