Home > Software engineering >  Regular express to find all lower case string with dot with Python
Regular express to find all lower case string with dot with Python

Time:12-14

Trying with python to find all strings inside a double quote, and with domain name like format, such as "abc.def.ghi".

I am currently using re.findall('\"([a-z\\.] [a-z]*)\"', input_string),

[a-z\\.] is for abc., def. and [a-z]* is for ghi.

So far it has no issue to match all string like "abc.def.ghi", but it also matches string that contains no ., such as "opq", "rst".

Question is, how to get rid of those string contains no dot . using regx?

CodePudding user response:

Pattern

'"([a-z] (?:\.[a-z] ) )"'

Explanation

  • Start & end with a double quote
  • capture group
    • [a-z] one letter a-z
    • (?:...) nested non-capturomg subgroup of the capture group
      • period followed by at least one letter a-z (repeated at least once)
      • the nested subgroup is repeated at least once
      • make subgroup non-capturing since otherwise findall will only report this subgroup

Usage

pattern = re.compile(r'\"[a-z] (?:\.[a-z] ) \"')
tests = ['"abc.def.ghi"', '"opq"']
for input_string in tests:
    print(f"input_string: {input_string}, findall:  {pattern.findall(input_string)}")

Output

input_string: "abc.def.ghi", found:  ['abc.def.ghi']
input_string: "opq", found:  []

CodePudding user response:

[a-z\\.] 

this part. matches any character a-z or . if you want the dot to be there, you will have to move it outside the character set something like

([a-z] \\.) 

result: visualization

  • Related