Home > OS >  How to extract numerical value and corresponding qualifier using Python/RegEx
How to extract numerical value and corresponding qualifier using Python/RegEx

Time:11-05

I have a dataframe with strings containing numeric and qualifiers (>, <, <=...). I would like to extract in 2 new columns the numeric value and the qualifier string. Exception: when the initial string is a range, I would like to extract the Max value in the range.

Please see below my example:

| ID      | initial_value | qualifier | num_value |
|-------- |---------------|-----------|-----------|
| Sample1 | 25.5          |=          | 25.5      |
| Sample2 | = 25.5        |=          | 25.5      |
| Sample3 | >35.5         |>          | 35.5      |
| Sample4 | > 45.5        |>          | 45.5      |
| Sample5 | <=55.5        |<=         | 55.5      |
| Sample6 | <= 65.5       |<=         | 65.5      |
| Sample7 | >>55.5        |>>         | 55.5      |
| Sample8 | 25.0-75.0     |-          | 75.0      |
| Sample9 | 25.0 - 75.0   |-          | 75.0      |

In advance thank you so much for your help.

Cheers

CodePudding user response:

You can use the ([^\s\d.] )?\s*([\d.] )$ regex and fillna:

df[['qualifier', 'num_value']] = df['initial_value'].str.extract('([^\s\d.] )?\s*([\d.] )$').fillna('=')

output:

        ID initial_value qualifier num_value
0  Sample1          25.5         =      25.5
1  Sample2        = 25.5         =      25.5
2  Sample3         >35.5         >      35.5
3  Sample4        > 45.5         >      45.5
4  Sample5        <=55.5        <=      55.5
5  Sample6       <= 65.5        <=      65.5
6  Sample7        >>55.5        >>      55.5
7  Sample8     25.0-75.0         -      75.0
8  Sample9   25.0 - 75.0         -      75.0

CodePudding user response:

With PyPi's regex module you could use

(?(DEFINE)
    (?<v>\d (?:\.\d )?)
    (?<q>[-=<>] )
)

(?|
    (?&v)\s*(?P<qualifier>-)\s*(?P<value>(?&v))
    |
    (?P<qualifier>(?&q))?\s*(?P<value>(?&v))
)

See a demo on regex101.com.

  • Related