Home > Mobile >  Keep Date Values Only When a Pandas DataFrame Column Includes New Lines, non-words
Keep Date Values Only When a Pandas DataFrame Column Includes New Lines, non-words

Time:05-04

I have imported the below sample data set in a pd dataframe.
My plan is to generate an output which looks like "wants" from "have".
In other words, I am trying to pick up date values only when it is mixed with time components and some unnecessary non-words.
I tried datetime.date but seems like not working since it only works for datetime format.
I have tried using several regex but they also removes hypens and leave a space between year, month, day.
What would be a good way to address this? Any help would be appreciated.

have

ID         Date_time  
210        01-01-2016\r\n01:07 PM       
205        01-06-2017\r\n01:10 PM            
...           ...                
1504       ââ¬Å½30-10-2014\r\n01:15 AM      
1544       ââ¬Å½11-10-2018\r\n05:38 AM          

wants

ID         Date_time  
210        2016-01-01      
205        2017-06-01            
...           ...                
1504       2014-10-30
1544       2018-10-11

CodePudding user response:

Use str.extract combined with pandas.to_datetime:

df['Date_time'] = pd.to_datetime(df['Date_time'].str.extract('(\d{2}-\d{2}-\d{4})',
                                 expand=False), format='%d-%m-%Y')
print(df)

# Output
     ID  Date_time
0   210 2016-01-01
1   205 2017-06-01
2  1504 2014-10-30
3  1544 2018-10-11
  • Related