Home > database >  Shuffle a collection of arrays
Shuffle a collection of arrays

Time:06-25

I have a bunch of corresponding training data for a model. I'm trying to randomize their orders.

What I thought would work would be this:

rand_inds = np.arange(len(d_bundle))
np.random.shuffle(rand_inds)

for i in [d_bundle, d2_bundle, d_location_bundle, output_bundle]:
    i = i[rand_inds]

However, this doesn't actually modify the stuffs inside of the list. I'd have to do it all manually. If I don't want to do it manually, it seems I could just make another array like c = [d_bundle, d2_bundle, d_location_bundle, output_bundle] and then run through the loop. Then I'd just unpack the bundles into c. However, this would use up more memory than needed right?

Is there a better way?

CodePudding user response:

I got this to work by splitting up the loop:

import numpy as np

# generate test lists. These are your "d_bundle", etc.
l1 = [1, 2, 3, 4, 5]
l2 = [10, 20, 30, 40, 50]
l3 = ['a', 'b', 'c', 'd', 'e']
ll = [l1, l2, l3]  # This is your list of lists

rand_inds = np.arange(len(l1)) # initial, as-is ordering.
np.random.shuffle(rand_inds)   # shuffle rand_inds

for i in range(len(ll)):   # Handle each sublist separately
    l = ll[i]              # Select the sublist to modify
    newl = list(l)         # This reserves memory so we have separate input and output sublists
    for j in range(len(newl)):  # Shuffle the sublist
        newl[j] = l[rand_inds[j]]
    ll[i] = list(newl)     # Put the new sublist in the list of lists

l1, l2, l3 = tuple(ll)     # write the variables back to the original names. Order must match the original `ll` assignment.

I used a lot of list() calls to be very clear about when I wanted to deal with list contents rather than modifying things in-place. This is probably not the most memory-efficient solution, but it should work.

CodePudding user response:

IIUC, You can do this by indexing as:

np.array(ll)[:, rand_inds]

This code will modify all the stuffs, and will make their types the same e.g. if we have float64 and float32, it will convert one of the types to the another one. You can convert the resulted NumPy array to list by add .tolist() at the end of that.

  • Using NumPy arrays and indexing, usually, will consume less memory and is very faster than common loops.
  • Related