For a sentence
'Foo bar was open on 12.03.2022 and closed on 3.05.22.'
with respectivelist = [12.03.2022, 4.04.2022, 3.05.22]
I want to get the start and end indices in the sentence as a tuple if a date in the list can be found in the sentence.
In this case: [(20,29), (45, 51)]
I have found the dates through regex but I cannot get the indices.
DAY = r'(?:(?:0)[1-9]|[12]\d|3[01])' # day can be from 1 to 31 with a leading zero
MONTH = r'(?:(?:0)[1-9]|1[0-2])' # month can be 1 to 12 with a leading zero
YEAR1 = r'(?:(?:20|)\d{2}|(?:19|){9}[0-9])' # Restricted the year to begin in 20th or 21st century
# Also the first two digits may be skipped if data is represented as dd.mm.yy
YEAR2 = r'(?:20\d{2}|199[0-9])'
BEGIN_LINE1 = r'(?<!\w)'
DELIM1 = r'(?:[\,\/\-\._])'
DELIM2 = r'(?:[\,\/\-\._])?'
# combined, several options
NUM_DATE = f"""(?P<date>
(?:
# DAY MONTH YEAR
(?:{BEGIN_LINE1}{DAY}{DELIM1}{MONTH}{DELIM1}{YEAR1})
|
(?:{BEGIN_LINE1}{DAY}{DELIM1}{MONTH})
|
(?:{BEGIN_LINE1}{MONTH}{DELIM1}{YEAR1})
|
(?:{BEGIN_LINE1}{DAY}{DELIM2}{MONTH}{DELIM2}{YEAR2})
|
(?:{BEGIN_LINE1}{MONTH}{DELIM2}{YEAR2})
)
)"""
myDate = re.compile(f'{NUM_DATE}', re.IGNORECASE | re.VERBOSE | re.UNICODE)
def find_date(subject):
"""_summary_
Args:
subject (_type_): _description_
Returns:
_type_: _description_
"""
if subject is None:
return subject
dates = list(set(myDate.findall(subject)))
return dates
CodePudding user response:
Use re.search
:
sent = 'Foo bar was open on 12.03.2022 and closed on 3.05.22.'
date_list = ['12.03.2022', '4.04.2022', '3.05.22']
hits = [re.search(date, sent) for date in date_list if re.search(date, sent)]
# indices of first match:
hits[0].span()
hits[0].span()
will give you the indices and hits[0].group()
the matched substring
CodePudding user response:
using regular for loop.
sent = 'Foo bar was open on 12.03.2022 and closed on 3.05.22.'
list = ['12.03.2022', '4.04.2022', '3.05.22']
tup = []
for i in list:
if i in sent:
start_index = sent.index(i)
end_index = start_index len(i) - 1
tup.append((start_index, end_index))
using list comprehension:
tup = [(sent.index(i), sent.index(i) len(i) - 1) for i in list if i in sent]
print(tup)
>>>> [(20, 29), (45, 51)]