I am given a sequence -- test = "The quick brown fox jumps over the lazy dog"
in a dictionary form --
{"animal": "fox", "pace": "quick", "action": "jumps"}
,
that needs to be sorted by value
based on where those values appear in the aforementioned sequence, i.e. the following form --
{"pace": "quick", "animal": "fox" ,"action": "jumps"}
The way that I am looking at this currently is thru the use of sorted()
method but I am unsure what the key
parameter would be?
In my current code, I am sorting the dict
using --
{k: v for k, v in sorted(d.items(), key=lambda item: test.split().index(item[1]))}
Is there a more efficient way to do this?
In a related question, if the original dictionary contained values in the form of sets
i.e. {"animals" : ("fox", "dog"), "animal": ("fox")}
, would be it possible to sort it still based on the elements -- {"animal": ("fox"), "animals": ("fox", "dog")}
?
CodePudding user response:
I think it's very close to what I would say is an "efficient" approach - though the jury's still out on whether it's the most efficient approach possible.
test = "The quick brown fox jumps over the lazy dog"
d = {"animal": "fox", "pace": "quick", "action": "jumps"}
# split into words once, rather than on each `lambda` call
res = dict(sorted(d.items(), key=lambda x, words=test.split(): words.index(x[1])))
print(res)
# {"pace": "quick", "animal": "fox" ,"action": "jumps"}
A likely faster approach could be to build a word to index mapping beforehand, then simply do a dict
lookup on each iteration.
seen_pos = {word: idx for idx, word in enumerate(test.split())}
res = dict(sorted(d.items(), key=lambda x: seen_pos[x[1]]))
When in doubt, always timeit
:
from timeit import timeit
print(timeit('{k: v for k, v in sorted(d.items(), key=lambda item, words=test.split(): test.split().index(item[1]))}',
globals=globals())) # 1.233
print(timeit('dict(sorted(d.items(), key=lambda x: test.split().index(x[1])))',
globals=globals())) # 1.012
print(timeit('dict(sorted(d.items(), key=lambda x: words.index(x[1])))',
setup='words=test.split()',
globals=globals())) # 0.534
print(timeit('dict(sorted(d.items(), key=lambda x: c[x[1]]))',
setup='c = {w: i for i, w in enumerate(test.split())}',
globals=globals())) # 0.442
CodePudding user response:
First observation: We want to use the values in order to figure out something about the key-value pairs (specifically, to sort them). That suggests that the dictionary is the wrong way around; we should invert it first:
test = "The quick brown fox jumps over the lazy dog"
info = {"animal": "fox", "pace": "quick", "action": "jumps"}
info_r = {v:k for k, v in info.items()}
Second observation: now we want to sort the keys of this inverted dict according to their position in the original sentence; but we could just as easily iterate over the sentence and check (with a now O(1) dict lookup) whether they are in the dict. As we go along, we can re-reverse the pairs and build the dict again in a dict comprehension:
{info_r[w]:w for w in test.split() if w in info_r}