Home > OS >  Convert Date String to Separate Columns as Year, Month and Day
Convert Date String to Separate Columns as Year, Month and Day

Time:07-21

My column "released" contains strings such as "Nov 21, 2000", "Nov, 2000", "2000". I want create three columns as "year", "month" and "day". When there are no month or no day, return none.

I cannot use pd.to_datetime as by doing so my string "Nov, 2000" will turn to "Nov, 01, 2000" and my string "2000" will turn to "Jan, 01, 2000". Therefore, currently I use this code:

df["year"] = df.released.str[-4:]
df["month_date"] = df.released.str[:-6]
df["month"] = df.month_date.str[0:3]
df["day"] = df.month_date.apply(lambda row: row[-2:] if len(row) > 5 else None)
del df["month_date"]

But I want to know whether there are cleverer method to achieve the goal.

Here is some data I have for the "released".

0       Nov 21, 2000
1       Sep 08, 2000
2          Nov, 1999
3       Nov 26, 1998
4          Apr, 2002
            ...
2699            2001
2700    Mar 26, 1998
2701    Oct 07, 1999
2702    Nov 20, 1995
2703    Mar 15, 1996

CodePudding user response:

You can extract Day, Month, year from the dataframe using regex

df['year'] = df['date'].str.extract(r'(\d{4})')
df['day'] = df['date'].str.extract(r'\s(\d{2}),')
df['month'] = df['date'].str.extract(r'([aA-zZ] )')

or you can do the same in one line using pandas extractall

p = r'(\d{4})|\s(\d{2}),|([a-zA-Z] )'
df[['year', 'day', 'month']] = df['date'].str.extractall(p).groupby(level=0).first()

This gives us

           date  year  day month
0  Nov 21, 2000  2000   21   Nov
1  Sep 08, 2000  2000   08   Sep
2     Nov, 1999  1999  NaN   Nov
3  Nov 26, 1998  1998   26   Nov
4     Apr, 2002  2002  NaN   Apr
5          2001  2001  NaN   NaN
6  Mar 26, 1998  1998   26   Mar
7  Oct 07, 1999  1999   07   Oct
8  Nov 20, 1995  1995   20   Nov
9  Mar 15, 1996  1996   15   Mar
  • Related