Home > database >  Remove text between two certain characters (multiple occurrences)
Remove text between two certain characters (multiple occurrences)

Time:11-26

I want to remove the text inside the character "-" and string "\n" (the characters as well)

For example, string = "hi.-hello\n good morning" the result I want to get is string = "hi. good morning"

and for string = "hi.-hello\n good morning -axq\n" the result I want to get is string = "hi. good morning axq"

I found these examples (as a reference on how to tweak the one I want)

import re
str = "hi.)hello| good morning"
re.sub(r"(?<=\)).*?(?=\|)", "", str)
>>>'hi.)| good morning'

and also this one

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'

and this one

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence.  '

But I still can't get the syntax for my case. I want to learn the general syntax of this as well (i.e., customization).

CodePudding user response:

This works when you want to delete the text between one pair e.g. (-,\n). When the problem is to delete text between several different pairs then I have to look better into the function how it really works.

import re
str = "hi.-hello\n good morning and a good-long \n day"
re.sub(r"-.*\n", "", str)
>>> hi. good morning and a good day

Edit: I have found out the trick for several symbol pairs:

str = "hi.-hello\n good morning and a good-long \n day (delete this), bye"
strt  =re.sub(r"[\(\-].*?[\n\)]", "", str)
print(strt)
>>> hi. good morning and a good day , bye

For several pairs put all into the brackets [<remove from>].*?[<remove to>]. Then each symbol that you want to remove has the form \<symbol to remove start/end>. In this example \-, \n (or \(\n)).

  • Related