I am wondering how it is possible to combine the following functions into one. The functions remove the entire word if "_" respectively "/" occur in a text.
I have tried the following, and the code fulfils it purpose. It his however cumbersome and I am wondering how to simplify it.
text = "This is _a default/ text"
def filter_string1(string):
a = []
for i in string.split():
if "_" not in i:
a.append(i)
return ' '.join(a)
def filter_string2(string):
a = []
for i in string.split():
if "/" not in i:
a.append(i)
return ' '.join(a)
text_no_underscore = filter_string1(text)
text_no_underscore_no_slash = filter_string2(text_no_underscore)
print(text_no_underscore_no_slash)
The output is (as desired):
"This is text"
CodePudding user response:
You can combine the if conditions.
text = "This is _a default/ text"
def filter(string):
a = []
for i in string.split():
if "_" not in i and "/" not in i:
a.append(i)
return ' '.join(a)
print(filter(text))
CodePudding user response:
There is a function called re.sub in python's re module which will let you accomplish this quickly.
def remove_words(text):
import re
return re.sub(
pattern=r'\s_[\s\S^\/]*\/', # regular expression used to match the parts to remove
repl='', # replace matched parts with empty string
string=text # use `text` as input
)
Explaining the regular expression \s_[\s\S^\/]*\/
(by deconstructing its parts):
\s_
match whitespace character followed by underscore[\s\S^\/]*
match any character sequence not containing a forward slash (sequence may be length 0)\/
match the forward slash
Testing the function:
text = "This is _a default/ text"
text_no_underscore_no_slash = remove_words(text)
print('Result:', text_no_underscore_no_slash)
# Result: This is text
text = "This is _a longer/ _and also custom/ text"
text_no_underscore_no_slash = remove_words(text)
print('Result:', text_no_underscore_no_slash)
# Result: This is text
By the way, your original code has a bug, I think.
text = "This is _a longer/ _and also custom/ text"
text_no_underscore = filter_string1(text)
text_no_underscore_no_slash = filter_string2(text_no_underscore)
print(text_no_underscore_no_slash == 'This is text')
# False