Sorting lists in dictionary based on other list without assigning them again-CodePudding

I have a large dictionary of which I want to sort the list values based on one list. For a simple dictionary I would do it like this:

d = {'a': [2, 3, 1], 'b': [103, 101, 102]}

d['a'], d['b'] = [list(i) for i in zip(*sorted(zip(d['a'], d['b'])))]

print(d)

Output:

{'a': [1, 2, 3], 'b': [102, 103, 101]}

My actual dictionary has many keys with list values, so unpacking the zip tuple like above becomes unpractical. Is there any way to do the go over the keys and values without specifying them all? Something like:

d.values() = [list(i) for i in zip(*sorted(zip(d.values())))]

Using d.values() results in SyntaxError: can't assign function call, but I'm looking for something like this.

CodePudding user response：

As far as I understand your question, you could try simple looping:

for k in d.keys():
    d[k] = [list(i) for i in zip(*sorted(zip(d['a'], d[k])))]

where d['a'] stores the list which others should be compared to. However, using dicts in this way seems slow and messy. Since every entry in your dictionary - presumably - is a list of the same length, a simple fix would be to store the data in a numpy array and call an argsort method to sort by ith column:

a = np.array( --your data here-- )
a[a[:, i].argsort()]

Finally, the most clear approach would be to use a pandas DataFrame, which is designed to store large amounts of data using a dict-like syntax. In this way, you could just sort by contents of a named column 'a':

df = pd.DataFrame( --your data here-- )
df.sort_values(by='a')

For further references, please see the links below: Sorting arrays in NumPy by column https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

CodePudding user response：

For the given input data and the required output then this will suffice:

from operator import itemgetter
d = {'a': [2, 3, 1], 'b': [103, 101, 102]}

def sort_dict(dict_, refkey):
    (reflist := [(v, i) for i, v in enumerate(dict_[refkey])]).sort(key=itemgetter(0))
    for v in dict_.values():
        v_ = v[:]
        for i, (_, p) in enumerate(reflist):
            v[i] = v_[p]

sort_dict(d, 'a')

print(d)

Output:

{'a': [1, 2, 3], 'b': [102, 103, 101]}

CodePudding user response：

If you have many keys (and they all have equal length list values) using pandas sort_values would be an efficient way of sorting:

d = {'a': [2, 3, 1], 'b': [103, 101, 102], 'c' : [4, 5, 6]}
d = pd.DataFrame(d).sort_values(by='a').to_dict('list')

Output:

{'a': [1, 2, 3], 'b': [102, 103, 101], 'c': [6, 4, 5]}

If memory is an issue, you can sort in place, however since that means sort_values returns None, you can no longer chain the operations:

df = pd.DataFrame(d)
df.sort_values(by='a', inplace=True)
d = df.to_dict('list')

The output is the same as above.