I need to remove the special characters inside angular brackets(<>)
date = "<dd_mm_yyyy>"
date_check_pattern= re.sub("[^A-Za-z<>]","",date)
But it is not working for
date = "<dd>_<mm>_<yyyy>"
#expected output <ddmmyyyy>
How to remove this angle bracket except first and last occurrence
CodePudding user response:
You could match the whole string and capture what is between the <
and >
Then do a second replacement removing all not allowed characters from the group 1 match in the first pattern.
The first pattern matches
<(.*?)>
Match from<...>
and capture what is in between in group 1(?=
Positive lookahead, assert what is to the right is[^\s<>]*$
Optionally match any char except<
or>
till the end of the string
)
Close the lookahead
The negated character class [^A-Za-z0-9]
in the second sub means matching 1 times any character except what is listed in the character class.
Example code
import re
strings = [
"<dd>_<mm>_<yyyy>",
"<dd_mm_yyyy>",
"<01>_<01>_<2022>",
"file_name<dd>_<mm>_<yyyy>.csv",
"file_name<01>_<01>_<2022>.csv",
"file_name_<dd>_<mm>_<yyyy>_anything.csv"
]
for s in strings:
print(
re.sub(
r'<(.*?)>(?=[^\s<>]*$)',
lambda x: f"<{re.sub(r'[^A-Za-z0-9] ', '', x.group(1))}>",
s
)
)
Output
<ddmmyyyy>
<ddmmyyyy>
<01012022>
file_name<ddmmyyyy>.csv
file_name<01012022>.csv
file_name_<ddmmyyyy>_anything.csv
CodePudding user response:
This might not be exactly what you want but if the characters always come at the start and the end of the line then you can use a negative lookbehind and a negative lookahead to not match the characters at start or end
import re
date = "<dd_mm_yyyy>"
date_check_pattern = re.sub("(?<!^)[^A-Za-z<>](?!$)","",date)
print(date_check_pattern)
output
<ddmmyyyy>
CodePudding user response:
I suggest you do not use Regex for this. You can use the following function and also customize specialChar
def removeSpecialChar(date):
result = ""
open = False
specialChar = "_<>"
index = 0
lastClose = len(date)-date[::-1].find(">")-1 if date[::-1].find(">") != -1 else None
if not lastClose: return date
for c in date:
if not open and c == "<":
open = True
result = c
elif open and c == ">" and index==lastClose : open = False
if open and c not in specialChar:
result = c
elif not open:
result = c
index =1
return result
example
>>> removeSpecialChar("<dd>_<mm>_<yyyy>")
"<ddmmyyyy>"
>>> removeSpecialChar("Date: (<dd>_<mm>_<yyyy>)")
"Date: (<ddmmyyyy>)"