I have a string format that can be changed by someone else (just say)
sample = f"This is a {pet} it has {number} legs"
And I have currently two string
a = "This is a dog it has 4 legs"
b = "This was a dog"
How to check which string satisfies this sample
format?
I can use python's string replace()
on sample
and create regex of it and check using re.match.
But the catch is sample
can be changed, so statically using replace
won't always work, as sample
may get more place holders.
CodePudding user response:
A simple little way to extract objects out will be
import re
patt = re.compile(r'This is a (. ) it has (\d ) legs',)
a = "This is a dog it has 4 legs"
b = "This was a dog"
match = patt.search(a)
print(match.group(1), match.group(2))
CodePudding user response:
Try this.
sample = "This is a {pet} it has {number} legs"
def check(string):
patt = sample.split(' ')
index = [i for i,v in enumerate(patt) if '{' in v and '}' in v]
if all(True if v==patt[i] or i in index else False for i,v in enumerate(string.split(' '))):
print(f'string matches the pattern')
else:
print(f"string does not match the pattern")
a = "This is a dog it has 4 legs"
b = "This was a dog"
check(a) # string matches the pattern
CodePudding user response:
First if you want to match a template string don't use the f''
string prefix or else it will be immediately evaluated. Instead just write the format string like:
sample = 'This is a {pet} it has {number} legs'
Here's a function I wrote for one project for parsing a format string and converting it to a regular expression:
import re
import string
def format_to_re(format_str, **kwargs):
r"""
Convert a format string to a regular expression, such that any format
fields may replaced with regular expression syntax, and any literals are
properly escaped.
As a special case, if a 2-tuple is given for the value of a field, the
first time the field appears in the format string the first element of the
tuple is used as the replacement, and the second element is used for all
subsequence replacements.
Examples
--------
This example uses a backslash just to add a little Windows flavor:
>>> filename_format = \
... r'scenario_{scenario}\{name}_{scenario}_{replicate}.npz'
>>> filename_re = format_to_re(filename_format,
... scenario=(r'(?P<scenario>0*\d )', r'0*\d '),
... replicate=r'0*\d ', name=r'\w ')
>>> filename_re
'scenario_(?P<scenario>0*\\d )\\\\\\w _0*\\d _0*\\d \\.npz'
>>> import re
>>> filename_re = re.compile(filename_re)
>>> filename_re
re.compile(...)
This regular expression can be used to match arbitrary filenames to
determine whether or not they are in the format specified by the original
``filename_format`` template, as well as to extract the values of fields by
using groups:
>>> match = filename_re.match(r'scenario_000\my_model_000_000.npz')
>>> match is not None
True
>>> match.group('scenario')
'000'
>>> filename_re.match(r'scenario_000\my_model_garbage.npz') is None
True
"""
formatter = string.Formatter()
new_format = []
seen_fields = set()
for item in formatter.parse(format_str):
literal, field_name, spec, converter = item
new_format.append(re.escape(literal))
if field_name is None:
continue
replacement = kwargs[field_name]
if isinstance(replacement, tuple) and len(replacement) == 2:
if field_name in seen_fields:
replacement = replacement[1]
else:
replacement = replacement[0]
new_format.append(replacement)
seen_fields.add(field_name)
return ''.join(new_format)
You can use this on your example like:
>>> sample_re = format_to_re(sample, pet=r'(?P<pet>. )', number=r'(?P<number>\d )')
>>> sample_re = re.compile(sample_re)
>>> sample_re
re.compile('This\\ is\\ a\\ (?P<pet>. )\\ it\\ has\\ (?P<number>\\d )\\ legs')
>>> m = sample_re.match('This is a dog it has 4 legs')
>>> m.groupdict()
{'pet': 'dog', 'number': '4'}
Depending on your use case you may be able to simplify it a bit. The original version was to handle some application-specific cases.
Another possible enhancement is, given an arbitrary format string, provide default regexps for each field found in it, possibly determined by any format specifiers in the field.
CodePudding user response:
When you run:
sample = f"This is a {pet} it has {number} legs"
sample does not have any placeholders
Sample is the string "This is a xxx it has yyy legs"
where xxx
and yyy
are already replaced. So, unless you know which are the parameters, there is little you can do.
If you want to have placeholders do not use a f-string:
sample = "This is a {pet} it has {number} legs"
formatted_string = sample.format(**{'pet': 'dog', 'number': '4'})
# "This is a dog it has 4 legs"
You can then run something like:
import string
from operator import itemgetter
sample = "This is a {pet} it has {number} legs"
keys = {k: r'\w ' for k in filter(None,
map(itemgetter(1), string.Formatter().parse(sample)))}
# {'pet': '\\w ', 'number': '\\w '}
regex = re.compile(sample.format(**keys))
a = "This is a dog it has 4 legs"
b = "This was a dog"
regex.match(a)
# <re.Match object; span=(0, 27), match='This is a dog it has 4 legs'>
regex.match(b)
# None
CodePudding user response:
I liked the approaches but I found a two liner solution: (I don't know the performance aspect of this, but it works!)
def pattern_match(input, pattern):
regex = re.sub(r'{[^{]*}','(.*)', "^" pattern "$")
if re.match(regex, input):
print(f"'{input}' matches the pattern '{pattern}'")
pattern_match(a, sample)
pattern_match(b, sample)
Output
'This is a dog it has 4 legs' matches the pattern 'This is a {pet} it has {number} legs'