Home > Enterprise >  How do I use regex to find an abnormal date in Python?
How do I use regex to find an abnormal date in Python?

Time:07-05

I'm currently cleaning a dataset with a column of strings containing delimiters of various types like # ? | ; /

I'm looking to capture any date in the string that is not of the format 2022-09-19T07:20:00 (The years in my dataset are usually of the format 20XX, like 2022 or 2023)

How do I do capture these outliers without writing a complex regex?

Here's an example of an outlier 5002522-03-04T01:03:00

Here's a sample string:

0/0/Just/Some/2022-07-06T17:05:00/2022-07-06T19:25:00/Sample/6780/Data/in///my_Dataset

Please Advise.

CodePudding user response:

This should match the outliers based on the example provided

[\/#?:|][^\/#?:|] \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[\/#?:|]

https://regex101.com/r/6R4YMQ/1

We set the normal timestamp with \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}

We want the string to be between the delimiters [\/#?:|]

So if there are any non-delimiter characters before the timestamp [^\/#?:|] it's a match.

  • Related