Problem:
I am generating a search query from key=value pairs. The system being queried does not support searching by the same field twice. I need to generate all unique permutations (assuming that is the correct word) of the pairs so I can generate multiple queries.
Example query:
python test.py --search field_1="books" and (field_2="paper" or (field_2="abcd" and field_4="test")) and field_20=80 and field_20="443" and not field_13=test or field19="test" and field19="4"
Ignore the boolean operations. After parsing I end up with:
['field_1="books"', 'field_2="paper"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_20="443"', 'field_13="test"', 'field19="test"', 'field19="4"']
Number and name of fields used/re-used is user dependent. I wish use this list to generate the below.
Desired Output:
['field_1="books"', 'field_4="test"', 'field_13="test"', 'field_2="paper"', 'field_20="80"', 'field19="test"']
['field_1="books"', 'field_4="test"', 'field_13="test"', 'field_2="abcd"', 'field_20="443"', 'field19="4"']
and so on...
Or a list of dicts is fine too. I just need every permutation where the same key (field_x) is not used twice in the same list.
Attempts:
Tried to break apart repeated fields and only generate permutations of repeats, then was going to append to the non-repeated fields. Seems way more involved than it should be.
repeat_pairs = []
once_pairs = []
for pair in search_pairs:
key = pair.split('=')[0]
if key in repeat_keys:
repeat_pairs.append(pair)
else:
once_pairs.append(pair)
print(search_pairs)
def gen_queries(repeat_list):
master_query_list = []
for item in repeat_list:
tmp_list = repeat_list[:]
key = item.split('=')[0]
value = item.split('=')[1]
build = []
build.append(item)
tmp_list.remove(item)
for sub in tmp_list:
sub_key = sub.split('=')[0]
sub_value = sub.split('=')[1]
if key != sub_key:
build.append(sub)
tmp_list.remove(sub)
master_query_list.append(build)
master_query_list.sort()
for item in master_query_list:
print(item)
gen_queries(repeat_pairs)
Outputs:
['field19="4"', 'field_2="paper"', 'field_20="80"', 'field_2="test"']
['field19="test"', 'field_2="paper"', 'field_20="80"', 'field_2="test"']
['field_20="443"', 'field_2="paper"', 'field_2="test"', 'field19="4"']
['field_20="80"', 'field_2="paper"', 'field_2="test"', 'field19="4"']
['field_2="abcd"', 'field_20="80"', 'field19="test"']
['field_2="paper"', 'field_20="80"', 'field19="test"']
['field_2="test"', 'field_20="80"', 'field19="test"']
This feels like something simple and doable with recursion but my brain just isn't clicking.
CodePudding user response:
Group these strings into "bins" by their key and compute a product of these bins:
conds = ['field_1="books"', 'field_2="paper"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_20="443"', 'field_13="test"', 'field19="test"', 'field19="4"']
from collections import defaultdict
from itertools import product
bins = defaultdict(list)
for c in conds:
k, _ = c.split('=')
bins[k].append(c)
for q in product(*bins.values()):
print(q)
Result
('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="test"')
('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="4"')
('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="test"')
('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="4"')
('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="test"')
('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="4"')
('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="test"')
('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="4"')