Home > Enterprise >  Pandas - How to Create Month and Year Column from DateTime Column
Pandas - How to Create Month and Year Column from DateTime Column

Time:10-22

I have the following data (I purposely created a DateTime column from the string column of dates because that's how I am receiving the data):

    import numpy as np
    import pandas as pd

    data = pd.DataFrame({"String_Date" : ['10/12/2021', '9/21/2021', '2/12/2010', '3/25/2009']})

    #Create DateTime columns
    data['Date'] = pd.to_datetime(data["String_Date"])

    data

       String_Date  Date
    0  10/12/2021   2021-10-12
    1  9/21/2021    2021-09-21
    2  2/12/2010    2010-02-12
    3  3/25/2009    2009-03-25

I want to add the following "Month & Year Date" column with entries that are comparable (i.e. can determine whether Oct-12 < Sept-21):

       String_Date  Date        Month & Year Date   
    0  10/12/2021   2021-10-12  Oct-12
    1  9/21/2021    2021-09-21  Sept-21
    2  2/12/2010    2010-02-12  Feb-12
    3  3/25/2009    2009-03-25  Mar-25

The "Month & Year Date" column doesn't have to be in the exact format I show above (although bonus points if it does), just as long as it shows both the month (abbreviated name, full name, or month number) and the year in the same column. Most importantly, I want to be able to groupby the entries in the "Month & Year Date" column so that I can aggregate data in my original data set across every month.

CodePudding user response:

You can do:

data["Month & Year Date"] = (
    data["Date"].dt.month_name()   "-"   data["Date"].dt.year.astype(str)
)

print(data)

Prints:

  String_Date       Date Month & Year Date
0  10/12/2021 2021-10-12      October-2021
1   9/21/2021 2021-09-21    September-2021
2   2/12/2010 2010-02-12     February-2010
3   3/25/2009 2009-03-25        March-2009

But if you want to group by month/year it's preferable to use:

data.groupby([data["Date"].dt.month, data["Date"].dt.year])

CodePudding user response:

data['Month & Year Date'] = data['Date'].dt.strftime('%b')   '-'   data['Date'].dt.strftime('%y')

print(data)

Outputs:

    String_Date Date    Month & Year Date
0   10/12/2021  2021-10-12  Oct-21
1   9/21/2021   2021-09-21  Sep-21
2   2/12/2010   2010-02-12  Feb-10
3   3/25/2009   2009-03-25  Mar-09

CodePudding user response:

You can use the .dt accessor to format your date field however you like. For your example, it'd look like this:

data['Month & Year Date'] = data['Date'].dt.strftime('%b-%y')

Although honestly I don't think that's the best representation for the purpose of sorting or evaluating greater than or less than. If what you want is essentially a truncated date, you could do this instead:

As a string - data['Month & Year Date'] = data['Date'].dt.strftime('%Y-%m-01')

As a datetime object - data['Month & Year Date'] = data['Date'].dt.to_period.dt.to_timestamp()

CodePudding user response:

You can use strftime. You can find the formats here

data['Month Day'] = data['Date'].apply(lambda x:x.strftime('%b-%d'))
  • Related