Lets say that I have this string:
a = 'ashfafhkiojojojhohkhgiobbboddbbgoifbafjgibibfoobfbobobfbafnongokhofgoon'
My goal is to create a function that get me any substrings that start with 'af' and end with 'kh'. In this example, I would get 2 substring
- 'afhkiojojojhohkh' and 'afjgibibfoobfbobobfbafnongokh'
I would also like to get the length of these substrings and their location within the larger string.
I have thought about using a for loop but I did not get very far. Any help is very much appreciated.
Thanks.
CodePudding user response:
Using the build-in module re
for regular expressions:
import re
text = 'ashfafhkiojojojhohkhgiobbboddbbgoifbafjgibibfoobfbobobfbafnongokhofgoon'
# tuples of the form (substr, (start, end), length)
matches = [(match.group(0), match.span(), int.__rsub__(*match.span()),) for match in re.finditer(r'(af.*?kh)', text)]
longest = max(matches, key=lambda pairs: pairs[-1])
print(matches)
print(longest)
EDIT
if :=
is supported the terms in the list comprehension can be simplified like this
(match.group(0), pos:=match.span(), int.__rsub__(*pos))
CodePudding user response:
You can use nested searches looking for the start
and end
:
A full function with dynamic start
and end
(you can change start and end values) would look like:
def find(inp, start, end):
ls = len(start)
le = len(end)
start_and_len = []
for i in range(len(inp)-ls 1):
if inp[i:i ls] == start:
for j in range(i, len(inp)-le 1):
if inp[j:j le] == end:
# (str, start index, len)
start_and_len.append((inp[i:j le], i, j le-i,))
return start_and_len
# Use as
>>> a = 'afafaf---khkhkh'
>>> find(a, 'af', 'kh')
[('afafaf---kh', 0, 11),
('afafaf---khkh', 0, 13),
('afafaf---khkhkh', 0, 15),
('afaf---kh', 2, 9),
('afaf---khkh', 2, 11),
('afaf---khkhkh', 2, 13),
('af---kh', 4, 7),
('af---khkh', 4, 9),
('af---khkhkh', 4, 11)]
# Your given example, with more matches
>>> a = 'ashfafhkiojojojhohkhgiobbboddbbgoifbafjgibibfoobfbobobfbafnongokhofgoon'
>>> find(a, 'af', 'kh')
[('afhkiojojojhohkh', 4, 16),
('afhkiojojojhohkhgiobbboddbbgoifbafjgibibfoobfbobobfbafnongokh', 4, 61),
('afjgibibfoobfbobobfbafnongokh', 36, 29),
('afnongokh', 56, 9)]