Home > front end >  how to match a value none or one time in regex using python re.findall
how to match a value none or one time in regex using python re.findall

Time:02-01

I want to match a pattern like java -c 123.java or java 123.java. See that -c is optional it can come once or none. So far I am using

java\s -c\s [\d] \.java|java\s [\d] \.java

which is working fine Please don't check command validity, this is a sample. I prefer this way than using pipe symbol like

java\s (-c){0,1}\s [\d] \.java

but when I use re.findall it is returning empty string but working fine with re.search. Since re.findall is compulsory for me, Is grouping like (-c) correct way or can you suggest any changes to the above regex?

Code:

seq="java -c 123.java"
pattern="java\s (-c){0,1}\s [\d] \.java"
pattern=re.compile(pattern)
pattern.findall(seq)

Output: ['-c'] I want to get java -c 123.java As @9769953 pointed if seq="java 123.java", output is empty list and if seq="java 123.java" #Note the extra spaces, output is ['']. @mozway I have tried what you said when I use

java\s (-c)?\s [\d] \.java

it's returning ['-c'] What am I doing wrong?

CodePudding user response:

From what I understand, you want to find the whole pattern of java <possible option flag> option-value with re.findall, while also retaining the possibility to use re.search (the latter will only find the first occurrence, if any).

I assume this means the input could be

text = "blah blah java -c 123.java blah blah java 123.java"

and you want to find the two occurrences.

re.findall captures groups inside the text string. So you need to group the relevant pattern, which in this case is the full pattern. To avoid capturing also the optional -c, you need to make this group non-capturing.

A normal group is surrounded by parentheses; a non-capturing group would start with (?: and ends with a normal corresponding closing ).

Together with the allowance for single whitespace if -c is not present (and not two matches of \s \s , which would lead to a requirement of at least two whitespace characters)[1], and with the simplification of using a ? for an optional match, the pattern would be:

pattern = r"(java\s (?:-c\s )?[\d] \.java)"

This also uses a raw string (by using the r prefix), which avoids the interpretation of some blackslashed character as something special, which is often not what one wants in a regular expression.

With the above input text and pattern, the results are now:

>>> regex = re.compile(pattern)
>>> regex.findall(text)
['java -c 123.java', 'java 123.java']
>>> regex.search(text)
<re.Match object; span=(10, 26), match='java -c 123.java'>
>>> regex.search(text).group(1)
'java -c 123.java'

[1] this pattern does not capture java -c123.java, which for short (i.e., one-letter) options, is often standard. If you want to also capture that possibility, change the second \s into \s*.

  •  Tags:  
  • Related