I tried many patterns, but cannot get the correct result.
I want to match only float when the line has keyword range
at the begining. My trouble is that the range
can follow by a :
, :
, :
, :
, :
, etc.
My best try is to use two patterns:
#1. (?i)(?<=range[: ])[:a-zA-Z0-9.$ -]
#2. [0-9.]
First run regex with the pattern #1, then get the ouput of pattern #1 and run regex one more time with pattern #2
How can I do that in one single pattern? Thanks so much
One more thing: my code is Python
Input:
range: $0.82
--> Expected output: 0.82
Input:
range:0.82
--> Expected output: 0.82
Input:
range: 0.82 - 0.85
--> Expected output: 0.82
, 0.85
Input:
range : 0.82 - 0.85
--> Expected output: 0.82
, 0.85
Input:
range : 0.82 - 0.85
--> Expected output: 0.82
, 0.85
Input:
range 0.82 0.85
--> Expected output: 0.82
, 0.85
CodePudding user response:
If you can make use of the Pythonregex PyPi module Then you can get multiple occurrences:
(?<=^range\b[\s:$-\d.]*)\d (?:\.\d )?
Explanation
(?<=
Positive lookbehind, assert that to the left is^range\b
Matchrange
at the start of the string[\s:$-\d.]*
Optionally match all allowed chars that could be in between
)
Close the lookbehind assertion\d (?:\.\d )?
Match 1 digits with an optional decimal part
Example
import regex
strings = [
"range: $0.82",
"range:0.82",
"range: 0.82 - 0.85",
"range : 0.82 - 0.85",
"range : 0.82 - 0.85",
"range 0.82 0.85"
]
pattern = r"(?<=^range\b[\s:$-\d.]*)\d (?:\.\d )?"
for s in strings:
print (regex.findall(pattern, s))
Output
['0.82']
['0.82']
['0.82', '0.85']
['0.82', '0.85']
['0.82', '0.85']
['0.82', '0.85']
CodePudding user response:
You could avoid regex completely. Those lines are not difficult to parse.
def parse(line):
if not line.startswith('range'):
return
line = line.replace(':',' ').replace('$','')
for token in line.split():
try:
yield float(token)
except ValueError:
continue
input_data = ['range: $0.82',
'range:0.82',
'range: 0.82 - 0.85',
'range : 0.82 - 0.85',
'range : 0.82 - 0.85',
'range 0.82 0.85']
r = [list(i) for i in map(parse, input_data)]
print(r)
[[0.82], [0.82], [0.82, 0.85], [0.82, 0.85], [0.82, 0.85], [0.82, 0.85]]
CodePudding user response:
This seems to work for me - however - there are probably a number of more efficient ways of doing it:
import re
input_data = ['range: $0.82',
'range:0.82',
'range: 0.82 - 0.85',
'range : 0.82 - 0.85',
'range : 0.82 - 0.85',
'range 0.82 0.85']
for i in range(len(input_data)):
output = re.findall(r'(range)(\s*:?\s*[$]*)([0-9]*.[0-9]*)(\s*-?\s*)([0-9]*.[0-9]*)?', input_data[i])
a = list(output[0])[2]
b = list(output[0])[4]
print(f'Input: {input_data[i]} --> Expected output: {a} , {b}')
OUTPUT:
Input: range: $0.82 --> Expected output: 0.82 ,
Input: range:0.82 --> Expected output: 0.82 ,
Input: range: 0.82 - 0.85 --> Expected output: 0.82 , 0.85
Input: range : 0.82 - 0.85 --> Expected output: 0.82 , 0.85
Input: range : 0.82 - 0.85 --> Expected output: 0.82 , 0.85
Input: range 0.82 0.85 --> Expected output: 0.82 , 0.85
You could also add some IF-statements to check to see if 'b' is empty, and control the output as required. However, I think the main thing that you wanted to achieve was a single REGEX statement that could extract the two numbers in question (if available).
Regex statement explanation:
r'(range)(\s*:?\s*[$]*)([0-9]*.[0-9]*)(\s*-?\s*)([0-9]*.[0-9]*)?'
First Group: (range)
This puts 'range' into the first group.
Second Group: (\s*:?\s*[$]*)
\s*
matches zero or more whitespace characters:?
matches an optional colon (:)[$]*
matches zero or more dollar signs ($)
Third Group: ([0-9]*.[0-9]*)
[0-9]*
matches zero or more numbers.
matches a decimal point- this is the group that relates to the number (0.82)
Fourth Group: (\s*-?\s*)
\s*
matches zero or more whitespace characters-?
matches an optional hyphen
Fifth Group: ([0-9]*.[0-9]*)?
[0-9]*
matches zero or more numbers.
matches a decimal point- The
?
at the end suggests that the group is optional. - This is the group that holds the second number (0.85)
CodePudding user response:
You could use this regex to extract your data:
^\s*range\D*(\d (?:\.\d )?)(?:\D*(\d (?:\.\d )?))?
Regex explanation:
^
: beginning of string\s*range
: asserts the string starts withrange
(possibly preceded by whitespace, if you don't want that remove the\s*
\D*
: some number of non-digit characters(\d (?:\.\d )?)
: a number, captured in group 1(?:\D*(\d (?:\.\d )?))?
an optional group of some non-digits followed by a number, captured in group 2
In python
import re
input_data = ['range: $0.82',
'range:0.82',
'range: 0.82 - 0.85',
'range : 0.82 - 0.85',
'range : 0.82 - 0.85',
'range 0.82 0.85']
results = [re.findall(r'^\s*range\D*(\d (?:\.\d )?)(?:\D*(\d (?:\.\d )?))?', d)[0] for d in input_data]
print(results)
Output:
[
('0.82', ''),
('0.82', ''),
('0.82', '0.85'),
('0.82', '0.85'),
('0.82', '0.85'),
('0.82', '0.85')
]