I want to remove the text inside the character "-" and string "\n" (the characters as well)
For example, string = "hi.-hello\n good morning" the result I want to get is string = "hi. good morning"
and for string = "hi.-hello\n good morning -axq\n" the result I want to get is string = "hi. good morning axq"
I found these examples (as a reference on how to tweak the one I want)
import re
str = "hi.)hello| good morning"
re.sub(r"(?<=\)).*?(?=\|)", "", str)
>>>'hi.)| good morning'
and also this one
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'
and this one
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence. '
But I still can't get the syntax for my case. I want to learn the general syntax of this as well (i.e., customization).
CodePudding user response:
This works when you want to delete the text between one pair e.g. (-,\n). When the problem is to delete text between several different pairs then I have to look better into the function how it really works.
import re
str = "hi.-hello\n good morning and a good-long \n day"
re.sub(r"-.*\n", "", str)
>>> hi. good morning and a good day
Edit: I have found out the trick for several symbol pairs:
str = "hi.-hello\n good morning and a good-long \n day (delete this), bye"
strt =re.sub(r"[\(\-].*?[\n\)]", "", str)
print(strt)
>>> hi. good morning and a good day , bye
For several pairs put all into the brackets [<remove from>].*?[<remove to>]
. Then each symbol that you want to remove has the form \<symbol to remove start/end>
. In this example \-
, \n
(or \(\n)
).