Home > database >  Regx to match text before numbers and special characters
Regx to match text before numbers and special characters

Time:12-15

I have a pack of texts say :

  • Suitcase 6l
  • Backpack (28kg)
  • Duffel Bag 6kg
  • Purse [3kg]
  • Duffel Bag [25l]
  • Duffel Bag 10l

I want to only extract the type of bags before the number, space, and any special characters like [ or (, like:

  • Suitecase
  • Backpack
  • Duffel Bag
  • Purse

I tried to use to match the nondigit characters with case insensitive, but I don't know how to exclude the special characters and space.

(?i)(\D*^)

Can someone help me how to do it using regular expression?

CodePudding user response:

You could match the different listed formats l and kg with or without the () and [] and capture the type of bags in a group.

For a case insensitive match, you can prepend the regex with (?i) or in Python use the re.I flag.

^([A-Z].*?)\s (?:\[\d (?:l|kg)]|\(\d (?:l|kg)\)|\d (?:l|kg)\b)
  • ^ Start of string
  • ([A-Z].*?) Start the match with a char A-Z and then match as few as possible chars
  • \s Match 1 whitespace chars
  • (?: Non capture group for the alternatives
    • \[\d (?:l|kg)] Match 1 digits and either l or kg between [...]
    • | Or
    • \(\d (?:l|kg)\) The same between (...)
    • | Or
    • \d (?:l|kg)\b Match 1 digits and either l or kg
  • ) Close the non capture group

Regex demo

CodePudding user response:

This regex will get you pretty close, with just the possibility of some extra spaces captured which you could get rid of with trim():

\b[a-zA-Z ] \b

This basically says to find the largest group of letters and spaces that don't contain any numbers or special characters.

CodePudding user response:

I believe this should be what you're looking for

[[:alpha:]] (\s[[:alpha:]] )?(?!\S*\n)

[[:alpha:]] matches any group of letters
(\s[[:alpha:]] )? optionally matches a white space and a group of letters
(?!\S*\n) this is a negative lookahead, if looking forward there is an optional group of non whitespaces followed by a new line then the match will be discarded.

  • Related