If I had a body of text and wanted to replace "ion" or "s" with nothing but keep the rest of the word (so if the word is reflection it should output reflect), how would I go about that? I have tried:
new_llw = re.sub(r'[a-z] ion', "", llw)
print(new_llw)
which replaces the whole word, and I tried
if re.search(r'[a-z] ion', "", llw) is True:
re.sub('ion', '', llw)
print(llw)
which gives me and error
TypeError: unsupported operand type(s) for &: 'str' and 'int'
CodePudding user response:
For the ion
replacement, you may use a positive lookbehind:
inp = "reflection"
output = re.sub(r'(?<=\w)ion\b', '', inp)
print(output) # reflect
CodePudding user response:
The TypeError: unsupported operand type(s) for &: 'str' and 'int'
error is due to the fact you are using re.search(r'[a-z] ion', "", llw)
like re.sub
. The second argument to re.search
is the input string, which is empty and the third argument is the flags, that are set with specific regex options (like re.A
or re.I
) that may present a bitwise mask (re.A | re.I
).
Now, if you need to match an ion
as a suffix in a word, you can use
new_llw = re.sub(r'\Bion\b', '', llw)
Here, \B
matches a location that is immediately preceded with a word char (a letter, digit or connector punctuation, like _
), then ion
matches ion
and \b
matches a location that is either at the end of string or immediately followed with a non-word char.
To also match an s
suffix:
new_llw = re.sub(r'\B(?:ion|s)\b', '', llw)
The (?:...)
is a non-capturing group.
See the regex demo.
Variations
If you consider words as letter sequences only, you can use
new_llw = re.sub(r'(?<=[a-zA-Z])(?:ion|s)\b', '', llw) # ASCII only version
new_llw = re.sub(r'(?<=[^\W\d_])(?:ion|s)\b', '', llw) # Any Unicode letters supported
Here, (?<=[a-zA-Z])
matches a location that is immediately preceded with an ASCII letter.