As the title says, I want to extract the text between the last two ocurrences of a character in a string.
I have:
'9500 anti-Xa IU/ml - 0,6 ml 5700 IU -'
'120 mg/ml – 0.165 ml -'
'300-300-300 IR/ml or IC/ml - 10 ml -'
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
I want to have:
'0,6 ml 5700 IU'
'0.165 ml'
'10 ml'
'15 g'
I tried using -\s*.*-
but it matches everything between first and last -
. What's the correct regex to use?
CodePudding user response:
With search:
import re
[re.search(r'[-–]\s*([^-–] ?)\s*[-–][^-–]*$', x).group(1) for x in l]
Or split:
[re.split(r'\s [-–]\s*', x, 2)[-2] for x in l]
output: ['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
used input:
l = ['9500 anti-Xa IU/ml - 0,6 ml 5700 IU -',
'120 mg/ml – 0.165 ml -',
'300-300-300 IR/ml or IC/ml - 10 ml -',
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
]
CodePudding user response:
You can use
[^-–—\s][^-–—]*?(?=\s*[-–—][^-–—]*$)
See the regex demo. Details:
[^-–—\s]
- a char other than whitespace,-
,–
and—
[^-–—]*?
- zero or more chars other than-
,–
and—
as few as possible(?=\s*[-–—][^-–—]*$)
- a positive lookahead that requires zero or more whitespaces, then a-
,–
or—
char and then zero or more chars other than-
,–
and—
till end of string immediately to the right of the current location.
CodePudding user response:
With your shown samples Only. Please try following regex with Python code, written and tested in Python3. Here is the Online demo for used regex.
import re
var="""9500 anti-Xa IU/ml - 0,6 ml 5700 IU -
120 mg/ml - 0.165 ml -
300-300-300 IR/ml or IC/ml - 10 ml -
Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -"""
[x.strip(' ') for x in re.findall(r'(?<=\s-|\s–)(.*?)(?=-)',var,re.M)]
Output will be as follows:
['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
Explanation: Simple explanation would be, using Python3's re
module's findall
function. Where I am using regex r'(?<=\s-|\s–)(.*?)(?=-)'
to get the required output. Then removing all leading and trailing spaces with strip
function from it to get expected output.
CodePudding user response:
Try to also match the blank space before the last dash -
:
\s\-\s(.*)\s\-
By the way, maybe regex101 website could help you next time you have a new regex issue.
EDIT
I just see that you have two types of dash symbols! Short -
and long –
. Try this regex instead:
\s[-–]\s(.*)\s[-–]