Home > other >  Clean regex output required
Clean regex output required

Time:12-22

I am new to regex and can't work around an issue. With this code, I need to extract date given in multiple formats. The regex code is giving me additional quote marks and commas. Is there a way to remove those and extract date only?

Code:

import re

text = '''04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
'''

xx = '(\\d{1,2}\[/-\]\\d{1,2}\[/-\]\\d{2,4})|(\[1|2\]\\d{3})'

matches = re.findall(xx, text)
matches

Output:

[('04/20/2009', ''),
 ('04/20/09', ''),
 ('4/20/09', ''),
 ('4/3/09', ''),
 ('', '2009'),
 ('', '2009'),
 ('', '2009'),
 ('', '2009'),
 ('', '2009')]

CodePudding user response:

This doesn't exactly answer the question, but maybe consider using the dateutil module, which already has a built in option to parse many different formats:

import dateutil

text = '''04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
'''

# Remove whitespace and split the dates by semicolons
text = text.strip('\n;').replace('\n', ';')

# Parse each date individually
dates = [dateutil.parser.parse(date) for date in text.split(';')]

CodePudding user response:

You can use the join method to concatenate the elements in the matches list into a single string.

For example, you can use the following code to extract the date strings from the matches list and join them into a single string:

date_strings = [date[0] or date[1] for date in matches]
date_string = ' '.join(date_strings)

This will create a new list date_strings that contains only the date strings from the matches list, and then use the join method to concatenate the elements in the list into a single string, separated by a single space character.

CodePudding user response:

From what I understand you're generating a list of tuples, but what you want is just want a text string that's a vertical list of the results?

You can accomplish that by first joining the individual tuple contents together with an empty string, then joining the list of the resulting strings together with a new-line character:

print "\n".join(map(''.join, matches));

04/20/2009
04/20/09
4/20/09
4/3/09
2009
2009
2009
2009
2009
  • Related