This code is to return a subset with size k from the set size n (EPI 5.15). That is, take n > 0, k <= n, and from n we (hypothetically) form a set {0, 1, 2, ..., n-1} from which we pick k elements to form a subset. There are nCk possibilities of picking a subset and we want it to be uniformly picked and we also want the permutation in that subset is also random. There are three versions of codes -- from official solution, my tweak, and my own solution. The latter two are WRONG but I don't know why. I will explain the gist of the algorithm right below the three codes.
Official Solution
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
rmap = H.get(r, r)
imap = H.get(i, i)
H[r] = imap
H[i] = rmap
return [H[i] for i in range(k)]
Change to the official solution (WRONG)
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
H[r] = H.get(i, i)
H[i] = H.get(r, r)
return [H[i] for i in range(k)]
My solution (DEAD WRONG)
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
if r in H:
H[i], H[r] = H[r], i
else:
H[i], H[r] = r, i
return [H[i] for i in range(k)]
Underlying Logic
We pick an element from the part of array A, <0, 1, 2, ..., n-1>, without duplicate. At first, pick r from array A and swap it with A[0]; then pick another r and swap it with A[1] ... until we have filled A[k-1], in total we have k elements, like the following code:
'''
A = <0, 1, 2, ..., n-1>
i 0 1 2 n-1
'''
def random_sampling(k, A):
for i in range(k):
r = random.randrange(i, len(A))
A[i], A[r] = A[r], A[i]
A = [i for i in range(n)]
k = some_constant
random_sampling(k, A)
answer = A[:k]
To reduce space complexity from O(n) to O(k) by mimicking an array <0, 1, 2, ..., n-1>, we changed the right above code into an official solution that uses a hash table from which we select an element to be included in a subset. The problem arises in the way I use hash table differently from original answer but I don't know why.
CodePudding user response:
, the last one seems not to make basic sense. An assignment like
H[i] = H[r], i
binds H[i]
to a 2-tuple, not to an integer.
The middle (second) one is stepping on its own toes:
H[r] = H.get(i, i)
H[i] = H.get(r, r)
The get()
in the second line is more-than-less useless, because H[r]
was bound in the line right above. r
is always in H
when the second line executes, so that pair of lines is the same as, e.g.,
temp = H.get(i, i)
H[i] = H[r] = temp
Which is pretty clearly not what you intended to do.
By the way, if you're keen to reduce the line count for some reason, this should work:
H = {}
for i in range(k):
r = random.randrange(i, n)
H[i], H[r] = H.get(r, r), H.get(i, i)
return [H[i] for i in range(k)]
But I find the first version to be clearest.
Edit: new version of last code
The last version was changed to do:
for i in range(k):
r = random.randrange(i, n)
if r in H:
H[i], H[r] = H[r], i
else:
H[i], H[r] = r, i
Now that would implement the conceptual "swap" logic if it were always the case, at the top of the loop. that H.get(i, i) == i
. But that isn't the case, so it can fail.
For example, start with n=9
and k=5
(this isn't delicate - pretty much arbitrary). On the first loop iteration (i=0
), suppose r=1
is picked. Then H[0]
is set to 1 and H[1]
to 0. That's fine. But, now it's not the case that H.get(1, 1)
is 1 - it's 0. That can cause trouble on the next step.
On the next iteration (i=1
), suppose r=5
is picked. So the code does
H[i], H[r] = r, i # which is
H[1], H[5] = 5, 1
Oops! Now 0 (which was in H[1]
) is no longer in H
, and 1 is in H twice (it was in H[0]
, and now it's also in H[5]
). That's not "a swap" at all.
BTW, there's another way to write this I like better, because it makes it very explicit that once a subset element is picked, that decision will never be changed. It also cuts the size of H
:
def random_subset(n, k):
H = {}
result = []
for i in range(k):
r = random.randrange(i, n)
result.append(H.get(r, r))
H[r] = H.pop(i, i)
return result