I have a large dictionary of which I want to sort the list values based on one list. For a simple dictionary I would do it like this:
d = {'a': [2, 3, 1], 'b': [103, 101, 102]}
d['a'], d['b'] = [list(i) for i in zip(*sorted(zip(d['a'], d['b'])))]
print(d)
Output:
{'a': [1, 2, 3], 'b': [102, 103, 101]}
My actual dictionary has many keys with list values, so unpacking the zip tuple like above becomes unpractical. Is there any way to do the go over the keys and values without specifying them all? Something like:
d.values() = [list(i) for i in zip(*sorted(zip(d.values())))]
Using d.values()
results in SyntaxError: can't assign function call
, but I'm looking for something like this.
CodePudding user response:
As far as I understand your question, you could try simple looping:
for k in d.keys():
d[k] = [list(i) for i in zip(*sorted(zip(d['a'], d[k])))]
where d['a']
stores the list which others should be compared to. However, using dict
s in this way seems slow and messy. Since every entry in your dictionary - presumably - is a list of the same length, a simple fix would be to store the data in a numpy array and call an argsort
method to sort by i
th column:
a = np.array( --your data here-- )
a[a[:, i].argsort()]
Finally, the most clear approach would be to use a pandas DataFrame
, which is designed to store large amounts of data using a dict
-like syntax. In this way, you could just sort by contents of a named column 'a':
df = pd.DataFrame( --your data here-- )
df.sort_values(by='a')
For further references, please see the links below: Sorting arrays in NumPy by column https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html
CodePudding user response:
For the given input data and the required output then this will suffice:
from operator import itemgetter
d = {'a': [2, 3, 1], 'b': [103, 101, 102]}
def sort_dict(dict_, refkey):
(reflist := [(v, i) for i, v in enumerate(dict_[refkey])]).sort(key=itemgetter(0))
for v in dict_.values():
v_ = v[:]
for i, (_, p) in enumerate(reflist):
v[i] = v_[p]
sort_dict(d, 'a')
print(d)
Output:
{'a': [1, 2, 3], 'b': [102, 103, 101]}
CodePudding user response:
If you have many keys (and they all have equal length list values) using pandas sort_values
would be an efficient way of sorting:
d = {'a': [2, 3, 1], 'b': [103, 101, 102], 'c' : [4, 5, 6]}
d = pd.DataFrame(d).sort_values(by='a').to_dict('list')
Output:
{'a': [1, 2, 3], 'b': [102, 103, 101], 'c': [6, 4, 5]}
If memory is an issue, you can sort in place, however since that means sort_values
returns None
, you can no longer chain the operations:
df = pd.DataFrame(d)
df.sort_values(by='a', inplace=True)
d = df.to_dict('list')
The output is the same as above.