I created a Vector of Vectors, named all_arrays
in Julia in this way for a specific purpose:
using DataFrames
using StatsBase
list_of_numbers = 1:17
all_arrays = [zeros(Float64, (17,)) for i in 1:1000]
round = 1
while round != 1001
random_array = StatsBase.sample(1:17 , length(list_of_numbers))
random_array = random_array/sum(random_array)
if (0.0 in random_array) || (random_array in all_arrays)
continue
end
all_arrays[round] = random_array
round = 1
println(round)
end
The dimension of all_arrays
is:
julia> size(all_arrays)
(1000,)
Then I want to convert all_arrays
into a DataFrame with 1000*17 dimensions (Note that each vector in the all_arrays
is a (17,) shape Vector). I tried This way:
df = DataFrames.DataFrame(zeros(1000,17) , :auto)
for idx in 1:length(all_arrays)
df[idx , :] = all_arrays[idx]
end
But I'm looking for a straightforward way for this instead of a for loop and a prebuilt DataFrame! Is there any?
CodePudding user response:
If you want simple code use (the length of the code is the same as below, but I find it conceptually simpler):
DataFrame(mapreduce(permutedims, vcat, all_arrays), :auto)
For such small data as you described this should be efficient enough.
If you want something faster use:
DataFrame([getindex.(all_arrays, i) for i in 1:17], :auto, copycols=false)
Here is a benchmark:
julia> using BenchmarkTools
julia> @btime DataFrame(mapreduce(permutedims, vcat, $all_arrays), :auto);
7.257 ms (3971 allocations: 65.22 MiB)
julia> @btime DataFrame([getindex.($all_arrays, i) for i in 1:17], :auto, copycols=false);
41.000 μs (88 allocations: 140.66 KiB)
CodePudding user response:
The simplest answer for me is
DataFrame(hcat(all_arrays...),:auto)
Also, it's worth noting (in response to your comments on Bogumil's answer) that there is nothing inherently slow about for
loops in Julia. Though whether or not they are slow for DataFrame construction, Bogumil will know best.