I have a dataframe with strings containing numeric and qualifiers (>, <, <=...). I would like to extract in 2 new columns the numeric value and the qualifier string. Exception: when the initial string is a range, I would like to extract the Max value in the range.
Please see below my example:
| ID | initial_value | qualifier | num_value |
|-------- |---------------|-----------|-----------|
| Sample1 | 25.5 |= | 25.5 |
| Sample2 | = 25.5 |= | 25.5 |
| Sample3 | >35.5 |> | 35.5 |
| Sample4 | > 45.5 |> | 45.5 |
| Sample5 | <=55.5 |<= | 55.5 |
| Sample6 | <= 65.5 |<= | 65.5 |
| Sample7 | >>55.5 |>> | 55.5 |
| Sample8 | 25.0-75.0 |- | 75.0 |
| Sample9 | 25.0 - 75.0 |- | 75.0 |
In advance thank you so much for your help.
Cheers
CodePudding user response:
You can use the ([^\s\d.] )?\s*([\d.] )$
regex and fillna
:
df[['qualifier', 'num_value']] = df['initial_value'].str.extract('([^\s\d.] )?\s*([\d.] )$').fillna('=')
output:
ID initial_value qualifier num_value
0 Sample1 25.5 = 25.5
1 Sample2 = 25.5 = 25.5
2 Sample3 >35.5 > 35.5
3 Sample4 > 45.5 > 45.5
4 Sample5 <=55.5 <= 55.5
5 Sample6 <= 65.5 <= 65.5
6 Sample7 >>55.5 >> 55.5
7 Sample8 25.0-75.0 - 75.0
8 Sample9 25.0 - 75.0 - 75.0
CodePudding user response:
With PyPi
's regex
module you could use
(?(DEFINE)
(?<v>\d (?:\.\d )?)
(?<q>[-=<>] )
)
(?|
(?&v)\s*(?P<qualifier>-)\s*(?P<value>(?&v))
|
(?P<qualifier>(?&q))?\s*(?P<value>(?&v))
)