I have a DataFrame
df = DataFrame(a = [1,1,2,2,2])
5×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 1
3 │ 2
4 │ 2
5 │ 2
and I want to filter for the groups with let's say 2 rows - ideally with using Chain
und potentially using DataFramesMeta
- and I cannot get it to work.
It does work when first creating a separate column for this like so
@chain df begin
groupby(:a)
@transform(:rows = length(:a))
@subset(:rows .== 2)
end
2×2 DataFrame
Row │ a rows
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 1 2
But it doesn't work when doing the same calculating it within @subset()
. Anyone got a clever solution?
CodePudding user response:
Instead of subset
, use filter
for this:
julia> @chain df begin
groupby(:a)
filter(:a => a -> length(a) == 2, _)
end
GroupedDataFrame with 1 group based on key: a
First Group (2 rows): a = 1
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 1
CodePudding user response:
In my recent blog post I have explained why we decided that this particular case should not be easily supported (in short: users were confused by pseudo-broadcasting in other cases; however, feel free to open an issue if you disagree with the logic explained there).
The solution is either to use filter
as Sundar R suggests (in 1.4 release we will add ungroup
kwarg to it to make it more convenient, see this issue) or to write:
julia> @chain df begin
groupby(:a)
@subset(fill(length(:a) == 2, length(:a)))
end
2×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 1