Trying with python to find all strings inside a double quote, and with domain name like format, such as "abc.def.ghi"
.
I am currently using re.findall('\"([a-z\\.] [a-z]*)\"', input_string)
,
[a-z\\.]
is for abc.
, def.
and [a-z]*
is for ghi
.
So far it has no issue to match all string like "abc.def.ghi"
, but it also matches string that contains no .
, such as "opq"
, "rst"
.
Question is, how to get rid of those string contains no dot .
using regx?
CodePudding user response:
Pattern
'"([a-z] (?:\.[a-z] ) )"'
Explanation
- Start & end with a double quote
- capture group
- [a-z] one letter a-z
- (?:...) nested non-capturomg subgroup of the capture group
- period followed by at least one letter a-z (repeated at least once)
- the nested subgroup is repeated at least once
- make subgroup non-capturing since otherwise findall will only report this subgroup
Usage
pattern = re.compile(r'\"[a-z] (?:\.[a-z] ) \"')
tests = ['"abc.def.ghi"', '"opq"']
for input_string in tests:
print(f"input_string: {input_string}, findall: {pattern.findall(input_string)}")
Output
input_string: "abc.def.ghi", found: ['abc.def.ghi']
input_string: "opq", found: []
CodePudding user response:
[a-z\\.]
this part. matches any character a-z or . if you want the dot to be there, you will have to move it outside the character set something like
([a-z] \\.)
result: visualization