I'm learning python so sorry if this is a basic question, but I couldn't find the specific answer I was looking for from questions posted here previously.
I have the dataframe below, which lists grocery items, their prices, and the dates purchased. I want to create a new column called year_month. So basically 03-22-2022 would turn into 03-2022 in this new column. Not exactly sure how to go about doing this of creating a new column and also grouping(?) the multiple dates by month. My end goal is to make a graph showing the change in grocery prices by month. For reference, my date column is currently in datetime format. Thanks in advance for the help!
What my dataframe currently looks like:
Item | Price| Date
Apples | 2.99 | 03-22-2022
Chicken | 8.99 | 4-01-2022
Bananas | 2.59 | 04-25-2022
Rice | 5.69 | 5-06-2022
What I want it to look like:
Item | Price| Date | Year_Month
Apples | 2.99 | 03-22-2022 | 03-2022
Chicken | 8.99 | 4-01-2022 | 04-2022
Bananas | 2.59 | 04-25-2022 | 04-2022
Rice | 5.69 | 5-06-2022 | 05-2022
CodePudding user response:
One way using pandas.to_datetime
with strftime
:
df["Year_Month"] = pd.to_datetime(df["Date"]).dt.strftime("%m-%Y")
print(df)
Output:
Item Price Date Year_Month
0 Apples 2.99 03-22-2022 03-2022
1 Chicken 8.99 4-01-2022 04-2022
2 Bananas 2.59 04-25-2022 04-2022
3 Rice 5.69 5-06-2022 05-2022
CodePudding user response:
To do what you're asking it probably makes sense to sort by Date
as well as use Series.dt.strftime()
to create the new Year_Month
column:
import pandas as pd
df = pd.DataFrame({'Item':['Apples','Chicken','Bananas','Rice'],'Price':[2.99,8.99,2.59,5.69],'Date':['03-22-2023','4-01-2022','04-25-2022','5-06-2022']})
df.Date = df.Date.astype('datetime64')
print(df)
df = df.sort_values('Date')
df['Year_Month'] = df.Date.dt.strftime('%m-%Y')
print(df)
Input:
Item Price Date
0 Apples 2.99 2023-03-22
1 Chicken 8.99 2022-04-01
2 Bananas 2.59 2022-04-25
3 Rice 5.69 2022-05-06
Output:
Item Price Date Year_Month
1 Chicken 8.99 2022-04-01 04-2022
2 Bananas 2.59 2022-04-25 04-2022
3 Rice 5.69 2022-05-06 05-2022
0 Apples 2.99 2023-03-22 03-2023
Note that sort_values()
ensures that the row with the 2023 date in the above example becomes the last row in the result.
CodePudding user response:
df['YearMOnth'] = df['Date'].astype('datetime64').dt.strftime('%m-%Y')
df
Item Price Date YearMOnth
0 Apples 2.99 03-22-2022 03-2022
1 Chicken 8.99 4-01-2022 04-2022
2 Bananas 2.59 04-25-2022 04-2022
3 Rice 5.69 5-06-2022 05-2022