Regx to match text before numbers and special characters-CodePudding

I have a pack of texts say :

Suitcase 6l
Backpack (28kg)
Duffel Bag 6kg
Purse [3kg]
Duffel Bag [25l]
Duffel Bag 10l

I want to only extract the type of bags before the number, space, and any special characters like [ or (, like:

Suitecase
Backpack
Duffel Bag
Purse

I tried to use to match the nondigit characters with case insensitive, but I don't know how to exclude the special characters and space.

(?i)(\D*^)

Can someone help me how to do it using regular expression?

CodePudding user response：

You could match the different listed formats l and kg with or without the () and [] and capture the type of bags in a group.

For a case insensitive match, you can prepend the regex with (?i) or in Python use the re.I flag.

^([A-Z].*?)\s (?:\[\d (?:l|kg)]|\(\d (?:l|kg)\)|\d (?:l|kg)\b)

^ Start of string
([A-Z].*?) Start the match with a char A-Z and then match as few as possible chars
\s Match 1 whitespace chars
(?: Non capture group for the alternatives
- \[\d (?:l|kg)] Match 1 digits and either l or kg between [...]
- | Or
- \(\d (?:l|kg)\) The same between (...)
- | Or
- \d (?:l|kg)\b Match 1 digits and either l or kg
) Close the non capture group

Regex demo

CodePudding user response：

This regex will get you pretty close, with just the possibility of some extra spaces captured which you could get rid of with trim():

\b[a-zA-Z ] \b

This basically says to find the largest group of letters and spaces that don't contain any numbers or special characters.

CodePudding user response：

I believe this should be what you're looking for

[[:alpha:]] (\s[[:alpha:]] )?(?!\S*\n)

[[:alpha:]] matches any group of letters
(\s[[:alpha:]] )? optionally matches a white space and a group of letters
(?!\S*\n) this is a negative lookahead, if looking forward there is an optional group of non whitespaces followed by a new line then the match will be discarded.