Home > Blockchain >  Filter grouped DataFrame in Julia
Filter grouped DataFrame in Julia

Time:02-28

I have a DataFrame

df = DataFrame(a = [1,1,2,2,2])

5×1 DataFrame
 Row │ a     
     │ Int64 
─────┼───────
   1 │     1
   2 │     1
   3 │     2
   4 │     2
   5 │     2

and I want to filter for the groups with let's say 2 rows - ideally with using Chain und potentially using DataFramesMeta - and I cannot get it to work.

It does work when first creating a separate column for this like so

@chain df begin
    groupby(:a)
    @transform(:rows = length(:a))
    @subset(:rows .== 2)
end

2×2 DataFrame
 Row │ a      rows  
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2
   2 │     1      2

But it doesn't work when doing the same calculating it within @subset(). Anyone got a clever solution?

CodePudding user response:

Instead of subset, use filter for this:

julia> @chain df begin
         groupby(:a)
         filter(:a => a -> length(a) == 2, _)
       end
GroupedDataFrame with 1 group based on key: a
First Group (2 rows): a = 1
 Row │ a     
     │ Int64 
─────┼───────
   1 │     1
   2 │     1

CodePudding user response:

In my recent blog post I have explained why we decided that this particular case should not be easily supported (in short: users were confused by pseudo-broadcasting in other cases; however, feel free to open an issue if you disagree with the logic explained there).

The solution is either to use filter as Sundar R suggests (in 1.4 release we will add ungroup kwarg to it to make it more convenient, see this issue) or to write:

julia> @chain df begin
           groupby(:a)
           @subset(fill(length(:a) == 2, length(:a)))
       end
2×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     1
  • Related