How can I sort a dictionary by value without creating even a temporary copy-CodePudding

I have a dictionary, which I need to sort by key. I physicaly do not have enough memory to do

x = dict(sorted(x.items(), key=lambda item: item[1])

Is there a way to sort it without creating even a temporary spike in memory usage. I though i could maybe use pop(), to remove items from the original, to keep the same amount of data in memory, but I don't know, if there is a simpler way to do it.

CodePudding user response：

Not no extra space, but much less, so maybe still useful for you or others with the same problem. Memory measurements (peak bytes) with 10^5 items:

12,059,636 baseline
24,408,256 original
19,065,304 better 1
18,709,352 better 2
14,265,296 sort keys, sort vals
12,949,792 sort keys, sort vals 2

baseline is for creating the original dict. Your original solution peaks at additional 12.3 MB. My best alternative peaks at additional 0.9 MB.

Code (Try it online!):

import tracemalloc as tm
from random import random
import gc

n = 10**5

def start(label):
  global x, label_
  label_ = label
  gc.collect()
  tm.start()
  x = {random(): random() for _ in range(n)}

def stop():
  global x
  print(f'{tm.get_traced_memory()[1]:10,}', label_)
  tm.stop()
  if label_ != 'baseline':
    assert len(x) == n
    assert list(x.values()) == sorted(x.values()), list(x.values())
  del x
  gc.collect()

for _ in range(2):

  start('baseline')
  stop()

  start('original')
  x = dict(sorted(x.items(), key=lambda item: item[1]))
  stop()

  start('better 1')
  x = list(x.items())
  x.sort(key=lambda item: item[1])
  x = dict(x)
  stop()

  start('better 2')
  ks = list(x)
  ks.sort(key=x.get)
  x = dict(zip(ks, map(x.pop, ks)))
  stop()

  start('sort keys, sort vals')
  keys = list(x)
  keys.sort(key=x.get)
  vals = list(x.values())
  del x
  vals.sort()
  x = dict(zip(keys, vals))
  stop()

  start('sort keys, sort vals 2')
  keys = list(x)
  keys.sort(key=x.get, reverse=True)
  vals = list(x.values())
  del x
  vals.sort(reverse=True)
  x = {}
  while keys:
    x[keys.pop()] = vals.pop()
  stop()

  print()

CodePudding user response：

You probably don't want to be using a dict for this use case, given that although dicts in modern Python maintain their order they aren't really built to be sorted.

In fact, if you don't have enough memory to make a single extra copy of this data (even just the references to it), you probably don't even want it all in memory in any form. Consider moving this data into a database that'll support whatever operations the rest of your program needs without having to load all the data into memory.

That said, if you need to sort a dictionary but without consuming space in proportion to the size of the dictionary, popping items out and re-adding them in order (i.e. selection sort) seems like the way to do it; more efficient sorting algorithms generally depend on the ability to arbitrarily swap or re-order items (which you can't do with a dict) or make temporary copies of subsets of the data (which we're assuming we can't do under our space constraints). Unfortunately selection sort has O(N^2) time complexity, but it does have the desired O(1) space complexity.

def inplace_dict_sort(d: dict) -> None:
    def swap(i):
        return i[1], i[0]
    k, v = min(d.items(), key=swap)
    while True:
        d[k] = d.pop(k)
        try:
            k, v = min((i for i in d.items() if swap(i) > (v, k)), key=swap)
        except ValueError:
            return

d = {'a': 'foo', 'b': 'bar', 'c': 'foo', 'd': 'qux', 'e': 'ola'}
inplace_dict_sort(d)
print(d)
# {'b': 'bar', 'a': 'foo', 'c': 'foo', 'e': 'ola', 'd': 'qux'}