I need to know how to exclude words that are in between commas using regex, i.e., "Lobasso, Jr., Sion" (I don't want the Jr.), so I have two ideas to use regex to include only words that are in between the two commas "ha,hello,bla" (hello) or to exclude the words that are between the commas, "he,blabla,lado" (helado).
CodePudding user response:
Sometimes people will add additional designations to their name. There also may be 0 or more whitespaces that appear before/after a comma. To cover those cases (and avoid having to import re
) consider using split()
followed by strip()
strings = [
"Lobasso, Jr., Sion",
"Lobasso, Jr., B.Sc., Sion",
"Lobasso , Jr. , B.Sc. , Sion",
"Lobasso,Sion"
]
for string in strings:
result = string.split(",")
print(result[0].strip(), result[-1].strip())
CodePudding user response:
You can exclude everything between commas like this:
print(re.sub(',.*,', '', "Lobasso, Jr., Sion"))
print(re.sub(',.*,', '', "he,blabla,lado"))
Output:
Lobasso Sion
helado
CodePudding user response:
Exclude: result = re.sub(r',([^,]*),', '', string)
>>> print(re.sub(r',[^,]*,', '', "Lobasso, Jr., Sion"))
Lobasso Sion
>>> print(re.sub(r',[^,]*,', '', "he,blabla,lado"))
helado
Include: result = ''.join(re.findall(r',([^,]*),', string))
>>> print(''.join(re.findall(r',([^,]*),', "ha,hello,bla")))
hello
in both cases, the regex is of the pattern
r',([^,]*),'
( ) a capture group, containing (these are only necessary in Include)
* zero or more occurrences of
[^,] any character other than ','
, , with a ',' on both sides
If a regex contains exactly one capture group then re.findall()
will return on whatever is found in that capture group instead of what's in the entire matching string, so in this case both expressions will act on whatever was matched by [^,]*
- the thing between the commas.
- to include, we find all the occurrences of text surrounded by commas, take them out, and then use
''.join()
to stitch them back together without anything in between - to exclude, we replace all occurrences of text surrounded by commas, and the surrounding commas, with the empty string