Home > other >  Julia: How to efficiently sort subarrays of 2 large arrays in parallel?
Julia: How to efficiently sort subarrays of 2 large arrays in parallel?

Time:06-27

I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.

For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:

function sort!(a,b) 
    p=sortperm(a); 
    a[:], b[:] = a[p], b[p]
    end
Threads.@threads for i in I[2:end] 
    sort!( a[I[i-1], I[i]-1],   b[I[i-1], I[i]-1] ) 
    end

However, already on a small example, I see that sort! does not alter the view of a subarray:

a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b);     println(a,"\n",b)  # works like it should

a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]);     println(a,"\n",b)  # does nothing!!!

Any help on how to create such function sort! (as efficient as possible) are welcome.


Background: I am dealing with data coming from sparse arrays:

using SparseArrays
n=10^6;   x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr; 

Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.

Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.

CodePudding user response:

The problem is that a[1:5] is not a view, it's just a copy. instead make the view like

function sort!(a,b) 
    p=sortperm(a); 
    a[:], b[:] = a[p], b[p]
end
Threads.@threads for i in I[2:end] 
    sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1)) 
end

is what you are looking for

ps.

the @view a[2:3], @view(a[2:3]) or the @views macro can help making thins more readable.

CodePudding user response:

First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).

Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.

What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.

  • Related