Kindly assist me in cleaning my date types in python.
My sample data is as follows:
INITIATION DATE | DATE CUT | DATE GIVEN |
---|---|---|
1/July/2022 | 21 July 2022 | 11-July-2022 |
17-July-2022 | 16/July/2022 | 21/July/2022 |
16-July-2022 | 01-July-2022 | 09/July/2022 |
19-July-2022 | 31 July 2022 | 27 July 2022 |
How do I remove all dashes/slashes/hyphens from dates in the different columns? I have 8 columns and 300 rows.
What i tried:
df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']]= df[['INITIATION DATE', 'DATE CUT', 'DATE GIVEN']].apply(pd.to_datetime, format = '%d%b%Y')
Desired output format for all: 1 July 2022
ValueError I'm getting:
time data '18 July 2022' does not match format '%d-%b-%Y' (match)
CodePudding user response:
to remove all dashes/slashes/hyphens from strings you can just use replace
method:
df.apply(lambda x: x.str.replace('[/-]',' ',regex=True))
>>>
'''
INITIATION DATE DATE CUT DATE GIVEN
0 1 July 2022 21 July 2022 11 July 2022
1 17 July 2022 16 July 2022 21 July 2022
2 16 July 2022 01 July 2022 09 July 2022
3 19 July 2022 31 July 2022 27 July 2022
and if you also need to conver strings to datetime then try this:
df.apply(lambda x: pd.to_datetime(x.str.replace('[/-]',' ',regex=True)))
>>>
'''
INITIATION DATE DATE CUT DATE GIVEN
0 2022-07-01 2022-07-21 2022-07-11
1 2022-07-17 2022-07-16 2022-07-21
2 2022-07-16 2022-07-01 2022-07-09
3 2022-07-19 2022-07-31 2022-07-27
CodePudding user response:
You can use pd.to_datetime to convert strings to datetime objects. The function takes a format
argument which specifies the format of the datetime string, using the usual format codes
df['INITIATION DATE'] = pd.to_datetime(df['INITIATION DATE'], format='%d-%B-%Y').dt.strftime('%d %B %Y')
df['DATE CUT'] = pd.to_datetime(df['DATE CUT'], format='%d %B %Y').dt.strftime('%d %B %Y')
df['DATE GIVEN'] = pd.to_datetime(df['DATE GIVEN'], format='%d/%B/%Y').dt.strftime('%d %B %Y')
output
INITIATION DATE DATE CUT DATE GIVEN
0 01 July 2022 21 July 2022 11 July 2022
1 17 July 2022 16 July 2022 21 July 2022
2 16 July 2022 01 July 2022 09 July 2022
3 19 July 2022 31 July 2022 27 July 2022
You get that error because your datetime strings (e.g. '18 July 2022'
) do not match your format specifiers ('%d-%b-%Y'
) because of the extra hyphens in the format specifier.