Home > Software design >  Regex to remove all kinds of bullet points and numbers or characters and more
Regex to remove all kinds of bullet points and numbers or characters and more

Time:12-06

i am trying to remove all kind of bullet points with different formats, this is basically the cases that i have:

c.2 Employed population below international poverty line, by sex and age (%) Age: 15 b.1 Employed population below international poverty line, by sex and age (%) Age: 15 a.1 Employed population below international poverty line, by sex and age (%) Age: 15

  1. Employed population below international poverty line, by sex and age (%) Age: 15 1.2 Employed population below international poverty line, by sex and age (%) Age: 15 1.1.1 Employed population below international poverty line, by sex and age (%) Age: 15 5.6.2 (S.1.C.1) Employed population below international poverty line, by sex and age (%) Age: 15 5.6.2 (S.3) Employed population below international poverty line, by sex and age (%) Age: 15 5.6.2 (S.4.C.13) Employed population below international poverty line, by sex and age (%) Age: 15

i want a regex to remove the bullet points no matter what form they are in and have only : Employed population below international poverty line, by sex and age (%) Age: 15

i tried to use ^(?:\d \.) \d*\s* it works fine but it only detects 1. or 1.2 or 1.1.1 thats what i wanted in the beginning so it was correct, but now my given is changed to this.

Thank you in advance, side note: i use python3

CodePudding user response:

^[a-z\d ]\.(\d )?\.?(\d )?(\s\(.*\)\s)?\s

This one is catching all types of bullet points in your example, here's the proof: https://regex101.com/r/sj4PgN/2

CodePudding user response:

You can use

 ^(?:[a-z]|\d )(?:\.\d )*\.?\s*(?:\([^()]*\)\s*)?

Explanation

  • ^ Start of string
  • (?:[a-z]|\d ) either match a char a-z or match 1 digits
  • (?:\.\d )* Optionally repeat . and 1 digits
  • \.? Match an optional dot
  • \s* Match optional whitespace chars
  • (?:\([^()]*\)\s*)? Optionally match a part (...) followed by optiinal spaces

Regex demo

In the replacement use an empty string.

If the part between the parenthesis is of the given specific pattern being an uppercase char A-Z followed by a dot and digit(s):

^(?:[a-z]|\d )(?:\.\d )*\.?\s*(?:\([A-Z]\.\d (?:\.[A-Z]\.\d )*\)\s*)?

Regex demo

Example

import re

pattern = r"^(?:[a-z]|\d )(?:\.\d )*\.?\s*(?:\([^()]*\)\s*)?"

s = ("c.2  Employed population below international poverty line, by sex and age (%) Age: 15 \n"
            "b.1  Employed population below international poverty line, by sex and age (%) Age: 15 \n"
            "a.1  Employed population below international poverty line, by sex and age (%) Age: 15 \n"
            "1. Employed population below international poverty line, by sex and age (%) Age: 15 \n"
            "1.2  Employed population below international poverty line, by sex and age (%) Age: 15  \n"
            "1.1.1 Employed population below international poverty line, by sex and age (%) Age: 15  \n"
            "5.6.2 (S.1.C.1) Employed population below international poverty line, by sex and age (%) Age: 15 \n"
            "5.6.2 (S.3) Employed population below international poverty line, by sex and age (%) Age: 15 \n"
            "5.6.2 (S.4.C.13) Employed population below international poverty line, by sex and age (%) Age: 15 ")

result = re.sub(pattern, "", s, 0, re.MULTILINE)
if result:
    print(result)

Output

Employed population below international poverty line, by sex and age (%) Age: 15 
Employed population below international poverty line, by sex and age (%) Age: 15 
Employed population below international poverty line, by sex and age (%) Age: 15 
Employed population below international poverty line, by sex and age (%) Age: 15 
Employed population below international poverty line, by sex and age (%) Age: 15  
Employed population below international poverty line, by sex and age (%) Age: 15  
Employed population below international poverty line, by sex and age (%) Age: 15 
Employed population below international poverty line, by sex and age (%) Age: 15 
Employed population below international poverty line, by sex and age (%) Age: 15 
  • Related