I have imported the below sample data set in a pd dataframe.
My plan is to generate an output which looks like "wants" from "have".
In other words, I am trying to pick up date values only when it is mixed with time components and some unnecessary non-words.
I tried datetime.date
but seems like not working since it only works for datetime format.
I have tried using several regex but they also removes hypens and leave a space between year, month, day.
What would be a good way to address this? Any help would be appreciated.
have
ID Date_time
210 01-01-2016\r\n01:07 PM
205 01-06-2017\r\n01:10 PM
... ...
1504 ââ¬Å½30-10-2014\r\n01:15 AM
1544 ââ¬Å½11-10-2018\r\n05:38 AM
wants
ID Date_time
210 2016-01-01
205 2017-06-01
... ...
1504 2014-10-30
1544 2018-10-11
CodePudding user response:
Use str.extract
combined with pandas.to_datetime
:
df['Date_time'] = pd.to_datetime(df['Date_time'].str.extract('(\d{2}-\d{2}-\d{4})',
expand=False), format='%d-%m-%Y')
print(df)
# Output
ID Date_time
0 210 2016-01-01
1 205 2017-06-01
2 1504 2014-10-30
3 1544 2018-10-11