I have large 1D arrays a
and b
, and an array of pointers I
that separates them into subarrays. My a
and b
barely fit into RAM and are of different dtypes (one contains UInt32
s, the other Rational{Int64}
s), so I don’t want to join them into a 2D array, to avoid changing dtypes.
For each i in I[2:end]
, I wish to sort the subarray a[I[i-1],I[i]-1]
and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]
. My attempt at this is:
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.@threads for i in I[2:end]
sort!( a[I[i-1], I[i]-1], b[I[i-1], I[i]-1] )
end
However, already on a small example, I see that sort!
does not alter the view of a subarray:
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b); println(a,"\n",b) # works like it should
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]); println(a,"\n",b) # does nothing!!!
Any help on how to create such function sort!
(as efficient as possible) are welcome.
Background: I am dealing with data coming from sparse arrays:
using SparseArrays
n=10^6; x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr;
Thus these a,b,I
serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.
Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.
CodePudding user response:
The problem is that a[1:5]
is not a view, it's just a copy. instead make the view like
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.@threads for i in I[2:end]
sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1))
end
is what you are looking for
ps.
the @view a[2:3]
, @view(a[2:3])
or the @views
macro can help making thins more readable.
CodePudding user response:
First of all, you shouldn't redefine Base.sort!
like this. Now, sort!
will shadow Base.sort!
and you'll get errors if you call sort!(a)
.
Also, a[I[i-1], I[i]-1]
and b[I[i-1], I[i]-1]
are not slices, they are just single elements, so nothing should happen if you sort them either with views
or not. And sorting arrays in a moving-window way like this is not correct.
What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i block_size-1)
repeatedly in a loop, choosing a block_size
that fits into memory, and modify both a
and b
according to p
, then continue to the remaining part of a
and find next p
and repeat until nothing remains in a
to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.