Within a string, I'm trying to match for all characters before the first comma, but I'm getting matches like this also:
It takes hard, daft,
Not once did I stop and say, as I do now,
Below is my regex:
match = re.match(r".*,", temp)
example:
list = ['In the morning, frank crashed his car.', 'Basically, he doesn't know how to drive.']
output_list = []
for i in list:
match = re.match(r".*,", i)
output_list.append(match.group())
I want to extract these two:
In the morning,
Basically,
CodePudding user response:
Match everything before the first comma:
^(. ?),
Example:
import re
list = ['In the morning, frank crashed his car.', 'Basically, he doesn\'t know how to drive.']
output_list = []
for i in list:
match = re.match(r"^(. ?),", i)
output_list.append(match.group())
print(output_list)
Output:
['In the morning,', 'Basically,']
This website is great for learning regex: https://regex101.com/
CodePudding user response:
I am assuming you want to match anything before the first occurrence of a comma character.
If this is the case, try matching your text against this regex [^,]*
that in Python looks as follows:
match = re.match(r"[^,]*", temp)
On top of that, maybe you will find this sandbox helpful for your trial and error: https://regexr.com/
However, instead of leveraging regexes, I'd suggest to split the string on comma characters and then pick for each the 1st element of the list holding the split string, e.g.
list = ['In the morning, frank crashed his car.', "Basically, he doesn't know how to drive."]
output_list = []
for i in list:
output_list.append(i.split(',')[0])
CodePudding user response:
You don't need to use regex for this situation, as you could use str.find()
and then slice the string from the beginning of the string until the found position.
#!/usr/bin/env python3
sentences = [
"In the morning, frank crashed his car.",
"Basically, he doesn't know how to drive."]
output_list = []
for sentence in sentences:
pos = sentence.find(",")
if pos != -1:
# since you also want the ',', slice to pos 1
output_list.append(sentence[0:pos 1])
print(output_list)
The output:
['In the morning,', 'Basically,']
If you wanted to use re
to do this, you have to fix your regex to use a non-greedy match on the *
, which is greedy by default and will try to match as much as possible, as described in the re
docs.
*?, ?, ??
The
'*'
,' '
, and'?'
qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.> is matched against ' b ', it will match the entire string, and not just ''. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using the RE <.?> will match only ''.
Like this probably does what you want (untested):
#!/usr/bin/env python3
import re
sentences = [
"In the morning, frank crashed his car, yep.",
"Basically, he doesn't know how to drive."]
output_list = []
for sentence in sentences:
if match := re.match(r".*?,", sentence):
output_list.append(match[0])
print(output_list)