I am trying to use the code of this question: DataFrames.jl : count rows by group while defining count column name where I would like to count the number of rows per group. Here is some reproducible code:
using DataFrames
df = DataFrame(group = ["A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"],
subgroup = ["X", "X", "X", "Y", "Y", "Y", "X", "X", "Y", "Y", "Y", "Y"])
12×2 DataFrame
Row │ group subgroup
│ String String
─────┼──────────────────
1 │ A X
2 │ A X
3 │ A X
4 │ A Y
5 │ A Y
6 │ A Y
7 │ B X
8 │ B X
9 │ B Y
10 │ B Y
11 │ B Y
12 │ B Y
When running the following code:
combine(groupby(df, [:group, :subgroup], nrow => :n))
Returns this error:
MethodError: no method matching groupby(::DataFrame, ::Vector{Symbol}, ::Pair{typeof(nrow), Symbol})
Closest candidates are:
groupby(::AbstractDataFrame, ::Any; sort, skipmissing) at ~/.julia/packages/DataFrames/JZ7x5/src/groupeddataframe/groupeddataframe.jl:211
Stacktrace:
[1] top-level scope
@ In[51]:1
[2] eval
@ ./boot.jl:368 [inlined]
[3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1428
I am not sure why this error happens while running the same code. So I was wondering if anyone could explain why this happens and if this is the right way to count the number of rows per group in Julia
?
CodePudding user response:
You've just got the brackets wrong:
combine(groupby(df, [:group, :subgroup]), nrow => :n)
Note the first closing parens before nrow
- this is the call to groupby
, and it is the result of this which you pass as the first argument to combine
CodePudding user response:
Besides the Nil's answer, you can also use the following syntax:
combine(groupby(df, [:group, :subgroup]), :group => (length) => :nrow)
# 4×3 DataFrame
# Row │ group subgroup nrow
# │ String String Int64
# ─────┼─────────────────────────
# 1 │ A X 3
# 2 │ A Y 3
# 3 │ B X 2
# 4 │ B Y 4
In the above, I'm counting nrows
per :group
and the subsequent :subgroup
. The first row of the result tells me the number of samples (rows) that have the A group label along with the X subgroup label.
*If you think this can't be implied as a different syntax, please tell me; I'll delete this answer.