Let's say i have an array with 10 elments:
arr = [1,2,3,4,5,6,7,8,9,10]
Then I want to define a function that takes this arr as parameter to perform a calculation, let's say for this example the calculation is the difference of means, for example:
If N=2 (That means group the elements of arr in groups of size 2 sequentially):
results=[]
result_1 = 1 2/2 - 3 4/2
result_2 = 3 4/2 - 5 6/2
result_3 = 5 6/2 - 7 8/2
result_4 = 7 8/2 - 9 10/2
The output would be:
results = [-2,-2,-2,-2]
If N=3 (That means group the elements of arr in groups of size 3 sequentially):
results=[]
result_1 = 1 2 3/3 - 4 5 6/3
result_2 = 4 5 6/3 - 7 8 9/3
The output would be:
results = [-3,-3]
I want to do this defining two functions:
Function 1 - Creates the arrays that will be used as input for 2nd function:
Parameters: array, N
returns: k groups of arrays -> seems to be ((length(arr)/N) - 1)
Function 2 - Will be the fucntion that gets the arrays (2 by 2) and perfoms the calculations, in this case, difference of means.
Parameters: array1,array2....arr..arr..
returns: list of the results
Important Note
My idea is to apply these fucntions to a stream of data and the calculation will be the PSI (Population Stability Index)
So, if my stream has 10k samples and I set the first function to N=1000, then the output to the second function will be 1k samples next 1k samples.
The process will be repetead till the end of the datastream
I was trying to do this in python (I already have the PSI code ready) but now I decided to use Julia for it, but I am pretty new to Julia. So, if anyone can give me some light here will be very helpfull.
CodePudding user response:
In Julia if you have a big Vector
and you want to calculate some statistics on groups of 3 elements you could do:
julia> a = collect(1:15); #creates a Vector [1,2,...,15]
julia> mean.(eachcol(reshape(a,3,length(a)÷3)))
5-element Vector{Float64}:
2.0
5.0
8.0
11.0
14.0
Note that both reshape
and eachcol
are non-allocating so no data gets copied in the process.
If the length of a
is not divisible by 3
you could truncate it before reshaping - to avoid allocation use view
for that:
julia> a = collect(1:16);
julia> mean.(eachcol(reshape(view(a,1:(length(a)÷3)*3),3,length(a)÷3)))
5-element Vector{Float64}:
2.0
5.0
8.0
11.0
14.0
Depending on what you actually want to do you might also want to take a look at OnlineStats.jl
https://github.com/joshday/OnlineStats.jl
CodePudding user response:
Well, I use JavaScript instead Of Python, But it would be same thing in python...
You need a chunks
function, that take array and chunk_size (N), lets say chunks([1,2,3,4], 2) -> [[1,2], [3,4]]
we have a sum method that add all element in array, sum([1,2]) -> 3
;
Both JavaScript and python support corouting, that you can use for lazy evaluation, And its called Generator
function, this type of function can pause its execution, and can resume on demand! This is useful for calculate stream of data.
let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
// Javascript Doesn't support `chunks` method yet, So we need to create one...
Array.prototype.chunks = function* (N) {
let chunk = [];
for (let value of this) {
chunk.push(value)
if (chunk.length == N) {
yield chunk;
chunk = []
}
}
}
Array.prototype.sum = function () {
return this.reduce((a, b) => a b)
}
function* fnName(arr, N) {
let chunks = arr.chunks(N);
let a = chunks.next().value.sum();
for (let b of chunks) {
yield (a / N) - ((a = b.sum()) / N)
}
}
console.log([...fnName(arr, 2)])
console.log([...fnName(arr, 3)])
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>