Home > Software design >  How to convert a vector of vectors into a DataFrame in Julia, without for loop?
How to convert a vector of vectors into a DataFrame in Julia, without for loop?

Time:07-13

I created a Vector of Vectors, named all_arrays in Julia in this way for a specific purpose:

using DataFrames
using StatsBase

list_of_numbers = 1:17

all_arrays = [zeros(Float64, (17,)) for i in 1:1000]
round = 1
while round != 1001
    random_array = StatsBase.sample(1:17 , length(list_of_numbers))
    random_array = random_array/sum(random_array)

    if (0.0 in random_array) || (random_array in all_arrays)
        continue
    end

    all_arrays[round] = random_array
    round  = 1
    println(round)
end

The dimension of all_arrays is:

julia> size(all_arrays)
(1000,)

Then I want to convert all_arrays into a DataFrame with 1000*17 dimensions (Note that each vector in the all_arrays is a (17,) shape Vector). I tried This way:

df = DataFrames.DataFrame(zeros(1000,17) , :auto)
for idx in 1:length(all_arrays)
    df[idx , :] = all_arrays[idx]
end

But I'm looking for a straightforward way for this instead of a for loop and a prebuilt DataFrame! Is there any?

CodePudding user response:

If you want simple code use (the length of the code is the same as below, but I find it conceptually simpler):

DataFrame(mapreduce(permutedims, vcat, all_arrays), :auto)

For such small data as you described this should be efficient enough.

If you want something faster use:

DataFrame([getindex.(all_arrays, i) for i in 1:17], :auto, copycols=false)

Here is a benchmark:

julia> using BenchmarkTools

julia> @btime DataFrame(mapreduce(permutedims, vcat, $all_arrays), :auto);
  7.257 ms (3971 allocations: 65.22 MiB)

julia> @btime DataFrame([getindex.($all_arrays, i) for i in 1:17], :auto, copycols=false);
  41.000 μs (88 allocations: 140.66 KiB)

CodePudding user response:

The simplest answer for me is

DataFrame(hcat(all_arrays...),:auto)

Also, it's worth noting (in response to your comments on Bogumil's answer) that there is nothing inherently slow about for loops in Julia. Though whether or not they are slow for DataFrame construction, Bogumil will know best.

  • Related