It's difficult for me to explain this within a title, so please allow me to do such here.
I'm working on a search interface for a utility I'm developing, one with google(ish) filters.
It works fine when there's only one filter in the query, but when there are two or more, problems appear.
So, let's say I have a query like intitle:foo bar inbody: boo far
As an example, while the first part makes it to the second part of the loop and is correctly interpreted as {intitle:foo bar}
, the next one is printed out in the first part of the loop as foo bar inbody
, followed by its value boo far
What should be happening is each filter should be recognized and isolated into its own pair (e.g. {intitle:foo bar}
{inbody: bar foo}
)
Below is the code responsible for this problem.
def ParseFilters(query):
filterVals = []
if ":" in query:
query = query.split(":")
for part in query:
# This is the first part of the loop
print(part)
if part in filters:
# This is the second part of the loop
listIndex = query.index(part)
filtering = query[listIndex 1]
for f in filters:
filtering = filtering.strip(f).lstrip()
pair = {
part: filtering
}
print(pair)
filterVals.append(pair)
return filterVals
The "filters" table is
filters = [
"intitle",
"inbody"
]
CodePudding user response:
If I understand your requirements correctly. I would write something like this:
from collections import defaultdict
filters = [
"intitle",
"inbody"
]
query = 'intitle:foo bar inbody: boo far '
result = defaultdict(list)
current_filter = None
for elem in query.split():
left, _, right = elem.partition(':')
if left in filters:
current_filter = left
if right:
result[current_filter].append(right)
else:
result[current_filter].append(left)
print(result)
Output:
defaultdict(<class 'list'>, {'intitle': ['foo', 'bar'], 'inbody': ['boo', 'far']})
In my opinion this is slightly more declarative and easier to make more robust in the future. You can experiment with it to make it meet your requirements. I suggest you check out str.partition
, it is incredibly useful for a lot of stuff like this. And defaultdict
works just like a dictionary.
CodePudding user response:
That's because when you do query.split(":")
your program has no way of knowing that inbody
is a filter and not part of intitle
value. The best way would be to use Regular Expressions to find all filters and all values and store them in different lists (i.e.: query_filters
and query_values
) and then make a dict
:
import re
filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"
# Create a regular expression to match filters
filters_re = re.compile(r"\s*[a-zA-Z] \:\s*")
# Find all filters
query_filters = filters_re.findall(query)
# Find all values by splitting query at the values matched by filters_re
query_values = filters_re.split(query)
# Cleaning up the strings
query_filters = map(lambda x: x.strip().replace(":", ""), query_filters)
query_values = map(lambda x: x.strip(), filter(None, query_values))
# Make pairs
filter_pairs = zip(query_filters, query_values)
# Remove filters that are not in filter_table
filter_pairs = filter(lambda x: x[0] in filter_table, filter_pairs)
filter_dict = dict(filter_pairs)
print(filter_dict)
Or, if you like one-liners:
import re
filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"
filter_dict = dict(filter(lambda x: x[0] in filter_table, zip(re.findall(r"[a-zA-Z] (?=\:)", query), map(lambda x: x.strip(), filter(None, re.split(r"\s*[a-zA-Z] \:\s*", query))))))
print(filter_dict)