I have the following data (I purposely created a DateTime column from the string column of dates because that's how I am receiving the data):
import numpy as np
import pandas as pd
data = pd.DataFrame({"String_Date" : ['10/12/2021', '9/21/2021', '2/12/2010', '3/25/2009']})
#Create DateTime columns
data['Date'] = pd.to_datetime(data["String_Date"])
data
String_Date Date
0 10/12/2021 2021-10-12
1 9/21/2021 2021-09-21
2 2/12/2010 2010-02-12
3 3/25/2009 2009-03-25
I want to add the following "Month & Year Date" column with entries that are comparable (i.e. can determine whether Oct-12 < Sept-21):
String_Date Date Month & Year Date
0 10/12/2021 2021-10-12 Oct-12
1 9/21/2021 2021-09-21 Sept-21
2 2/12/2010 2010-02-12 Feb-12
3 3/25/2009 2009-03-25 Mar-25
The "Month & Year Date" column doesn't have to be in the exact format I show above (although bonus points if it does), just as long as it shows both the month (abbreviated name, full name, or month number) and the year in the same column. Most importantly, I want to be able to groupby the entries in the "Month & Year Date" column so that I can aggregate data in my original data set across every month.
CodePudding user response:
You can do:
data["Month & Year Date"] = (
data["Date"].dt.month_name() "-" data["Date"].dt.year.astype(str)
)
print(data)
Prints:
String_Date Date Month & Year Date
0 10/12/2021 2021-10-12 October-2021
1 9/21/2021 2021-09-21 September-2021
2 2/12/2010 2010-02-12 February-2010
3 3/25/2009 2009-03-25 March-2009
But if you want to group by month/year it's preferable to use:
data.groupby([data["Date"].dt.month, data["Date"].dt.year])
CodePudding user response:
data['Month & Year Date'] = data['Date'].dt.strftime('%b') '-' data['Date'].dt.strftime('%y')
print(data)
Outputs:
String_Date Date Month & Year Date
0 10/12/2021 2021-10-12 Oct-21
1 9/21/2021 2021-09-21 Sep-21
2 2/12/2010 2010-02-12 Feb-10
3 3/25/2009 2009-03-25 Mar-09
CodePudding user response:
You can use the .dt accessor to format your date field however you like. For your example, it'd look like this:
data['Month & Year Date'] = data['Date'].dt.strftime('%b-%y')
Although honestly I don't think that's the best representation for the purpose of sorting or evaluating greater than or less than. If what you want is essentially a truncated date, you could do this instead:
As a string -
data['Month & Year Date'] = data['Date'].dt.strftime('%Y-%m-01')
As a datetime object -
data['Month & Year Date'] = data['Date'].dt.to_period.dt.to_timestamp()
CodePudding user response:
You can use strftime
. You can find the formats here
data['Month Day'] = data['Date'].apply(lambda x:x.strftime('%b-%d'))