Home > Software design >  Remove substring of digits from string (Python)
Remove substring of digits from string (Python)

Time:07-25

<elem1><elem2>20,000 Leagues Under the Sea1050251</elem2></elem1>
<elem1><elem2>1002321Robinson Crusoe1050251</elem2></elem1>

I'm working with an XML file and had to insert elements above extracted from it into another XML file. The problem is, I have no idea how to remove the id (7-digit substrings) used to track the position from the string. Removing characters between ">" and "<" isn't feasible, because text sometimes starts with id and sometimes with title that begins with numbers. What I'd need is something that could remove only and any 7-digit substrings from a string, but I've only found code that can do it for specified substrings

CodePudding user response:

You can try with regex:

import re


string = """<elem1><elem2>20,000 Leagues Under the Sea1050251</elem2></elem1>
<elem1><elem2>1002321Robinson Crusoe1050251</elem2></elem1>"""

pattern = re.compile(r"\d{7}")  # pattern that matches exactly 7 consecutive ascii digits
result = pattern.sub("", string)  # returns a string where the matched pattern is replaced by the given string
print(result)

Output:

<elem1><elem2>20,000 Leagues Under the Sea</elem2></elem1>
<elem1><elem2>Robinson Crusoe</elem2></elem1>

Useful:

  • Related