Sort a dictionary (python3.7 ) based on values (strings/list) in the order that they appear in a seq-CodePudding

I am given a sequence -- test = "The quick brown fox jumps over the lazy dog" in a dictionary form --

{"animal": "fox", "pace": "quick", "action": "jumps"},

that needs to be sorted by value based on where those values appear in the aforementioned sequence, i.e. the following form --

{"pace": "quick", "animal": "fox" ,"action": "jumps"}

The way that I am looking at this currently is thru the use of sorted() method but I am unsure what the key parameter would be?

In my current code, I am sorting the dict using --

{k: v for k, v in sorted(d.items(), key=lambda item: test.split().index(item[1]))}

Is there a more efficient way to do this?

In a related question, if the original dictionary contained values in the form of sets i.e. {"animals" : ("fox", "dog"), "animal": ("fox")}, would be it possible to sort it still based on the elements -- {"animal": ("fox"), "animals": ("fox", "dog")}?

CodePudding user response：

I think it's very close to what I would say is an "efficient" approach - though the jury's still out on whether it's the most efficient approach possible.

test = "The quick brown fox jumps over the lazy dog"

d = {"animal": "fox", "pace": "quick", "action": "jumps"}

# split into words once, rather than on each `lambda` call
res = dict(sorted(d.items(), key=lambda x, words=test.split(): words.index(x[1])))

print(res)
# {"pace": "quick", "animal": "fox" ,"action": "jumps"}

A likely faster approach could be to build a word to index mapping beforehand, then simply do a dict lookup on each iteration.

seen_pos = {word: idx for idx, word in enumerate(test.split())}
res = dict(sorted(d.items(), key=lambda x: seen_pos[x[1]]))

When in doubt, always timeit:

from timeit import timeit

print(timeit('{k: v for k, v in sorted(d.items(), key=lambda item, words=test.split(): test.split().index(item[1]))}',
             globals=globals()))  # 1.233
print(timeit('dict(sorted(d.items(), key=lambda x: test.split().index(x[1])))',
             globals=globals()))  # 1.012
print(timeit('dict(sorted(d.items(), key=lambda x: words.index(x[1])))',
             setup='words=test.split()',
             globals=globals()))  # 0.534
print(timeit('dict(sorted(d.items(), key=lambda x: c[x[1]]))',
             setup='c = {w: i for i, w in enumerate(test.split())}',
             globals=globals()))  # 0.442

CodePudding user response：

First observation: We want to use the values in order to figure out something about the key-value pairs (specifically, to sort them). That suggests that the dictionary is the wrong way around; we should invert it first:

test = "The quick brown fox jumps over the lazy dog"
info = {"animal": "fox", "pace": "quick", "action": "jumps"}
info_r = {v:k for k, v in info.items()}

Second observation: now we want to sort the keys of this inverted dict according to their position in the original sentence; but we could just as easily iterate over the sentence and check (with a now O(1) dict lookup) whether they are in the dict. As we go along, we can re-reverse the pairs and build the dict again in a dict comprehension:

{info_r[w]:w for w in test.split() if w in info_r}