Home > Software engineering >  Split a DataFrame into a Vector of DataFrames
Split a DataFrame into a Vector of DataFrames

Time:04-06

I have a DataFrame

df = DataFrame(a=[1,1,2,2],b=[6,7,8,9])

4×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      6
   2 │     1      7
   3 │     2      8
   4 │     2      9

Is there a canonical way of splitting it into a Vector{DataFrame}s? I can do

[df[df.a .== i,:] for i in unique(df.a)]

2-element Vector{DataFrame}:
 2×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      6
   2 │     1      7

 2×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      6
   2 │     1      7

but is there maybe something more elegant?

CodePudding user response:

Use:

julia> gdf = groupby(df, :a, sort=true)
GroupedDataFrame with 2 groups based on key: a
First Group (2 rows): a = 1
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      6
   2 │     1      7
⋮
Last Group (2 rows): a = 2
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     2      8
   2 │     2      9

(you could omit sort=true, but sorting ensures that the output is ordered in ascending order of the lookup key).

Then you can just work with this object as a vector:

julia> gdf[1]
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      6
   2 │     1      7

julia> gdf[2]
2×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     2      8
   2 │     2      9

This operation is non-allocating (it is a view into your original data frame).

If you really want Vector{DataFrame} (i.e. make copies of all groups) do:

julia> collect(DataFrame, gdf)
2-element Vector{DataFrame}:
 2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      6
   2 │     1      7
 2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     2      8
   2 │     2      9
  • Related