Regex to match all types of percentage-CodePudding

I have some % cases as the follow -

I want to match all the percentages type except anything larger than 100. Expected Output:

I have tried (Regular Expression for Percentage of marks). But this one fails to get all the cases that I want. Also, I am replacing the non-match with empty string. So my code in python looks like like this -

pattern=r'(\b(?<!\.)(?!0 (?:\.0 )?%)(?:\d|[1-9]\d|100)(?:(?<!100)\.\d )?$)'
df['Percent']=df['Percent'].astype(str).str.extract(pattern)[0]

Many thanks.

Edit: The solution (by @rv.kvetch) matches most of the edge cases except the 0 ones but I can work with that limitation. The original post had requirement of not matching 0 case or 0%.

CodePudding user response：

I'm probably very close but looks like this is working for me so far:

^(?:0{0,})((?:[1-9]{1,2}|100)?(?:\.\d )?)%?$

Regex demo

Description

First non-capturing group

(?:0{0,}) - non-capturing group which matches a leading 0, that appears zero or more times.

First capture group

(?:[1-9]{1,2}|100)? - Optional, non-capturing group which matches the digits 1-9 one to two times, to essentially cover the range 1-99. Then an or condition so we also cover 100. This group is made optional by ? to cover cases like .24, which is still a valid percentage.
(?:\.\d )? - Optional, non-capturing group which matches the fractional part, e.g. .123. This is optional because numbers like 20 are valid percentage values by themselves.

Last non-capturing group

%? - finally, here we match the optional trailing percent (%) symbol that can come at the end.

CodePudding user response：

If you want, you can do it without using regex.

nums = ['12.02'
'16.59',
'81.61%',
'45',
'24.812',
'51.35',
'19348952',
'88.22',
'0',
'000',
'021',
'.85%',
'100']

for n in nums:
  x = n.sptrip('%')
  x = int(x)
  if x <= 100:
    print(n)