Home > OS >  cleaning date columns in python
cleaning date columns in python

Time:12-03

Kindly assist me in cleaning my date types in python.

My sample data is as follows:

INITIATION DATE DATE CUT DATE GIVEN
1/July/2022 21 July 2022 11-July-2022
17-July-2022 16/July/2022 21/July/2022
16-July-2022 01-July-2022 09/July/2022
19-July-2022 31 July 2022 27 July 2022

How do I remove all dashes/slashes/hyphens from dates in the different columns? I have 8 columns and 300 rows.

What i tried:

df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']]= df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']].apply(pd.to_datetime, format = '%d%b%Y')

Desired output format for all: 1 July 2022

ValueError I'm getting:

time data '18 July 2022' does not match format '%d-%b-%Y' (match)

CodePudding user response:

to remove all dashes/slashes/hyphens from strings you can just use replace method:

df.apply(lambda x: x.str.replace('[/-]',' ',regex=True))

>>>
'''
  INITIATION DATE      DATE CUT    DATE GIVEN
0     1 July 2022  21 July 2022  11 July 2022
1    17 July 2022  16 July 2022  21 July 2022
2    16 July 2022  01 July 2022  09 July 2022
3    19 July 2022  31 July 2022  27 July 2022

and if you also need to conver strings to datetime then try this:

df.apply(lambda x: pd.to_datetime(x.str.replace('[/-]',' ',regex=True)))

>>>
'''
  INITIATION DATE   DATE CUT DATE GIVEN
0      2022-07-01 2022-07-21 2022-07-11
1      2022-07-17 2022-07-16 2022-07-21
2      2022-07-16 2022-07-01 2022-07-09
3      2022-07-19 2022-07-31 2022-07-27

CodePudding user response:

You can use pd.to_datetime to convert strings to datetime objects. The function takes a format argument which specifies the format of the datetime string, using the usual format codes

df['INITIATION DATE'] = pd.to_datetime(df['INITIATION DATE'], format='%d-%B-%Y').dt.strftime('%d %B %Y')
df['DATE CUT'] = pd.to_datetime(df['DATE CUT'], format='%d %B %Y').dt.strftime('%d %B %Y')
df['DATE GIVEN'] = pd.to_datetime(df['DATE GIVEN'], format='%d/%B/%Y').dt.strftime('%d %B %Y')

output

INITIATION DATE      DATE CUT    DATE GIVEN
0    01 July 2022  21 July 2022  11 July 2022
1    17 July 2022  16 July 2022  21 July 2022
2    16 July 2022  01 July 2022  09 July 2022
3    19 July 2022  31 July 2022  27 July 2022

You get that error because your datetime strings (e.g. '18 July 2022') do not match your format specifiers ('%d-%b-%Y') because of the extra hyphens in the format specifier.

  • Related