Home > Back-end >  MethodError: no method matching groupby(::DataFrame, ::Vector{Symbol}, ::Pair{typeof(nrow), Symbol})
MethodError: no method matching groupby(::DataFrame, ::Vector{Symbol}, ::Pair{typeof(nrow), Symbol})

Time:12-25

I am trying to use the code of this question: DataFrames.jl : count rows by group while defining count column name where I would like to count the number of rows per group. Here is some reproducible code:

using DataFrames

df = DataFrame(group = ["A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"],
               subgroup = ["X", "X", "X", "Y", "Y", "Y", "X", "X", "Y", "Y", "Y", "Y"])

12×2 DataFrame
 Row │ group   subgroup 
     │ String  String   
─────┼──────────────────
   1 │ A       X
   2 │ A       X
   3 │ A       X
   4 │ A       Y
   5 │ A       Y
   6 │ A       Y
   7 │ B       X
   8 │ B       X
   9 │ B       Y
  10 │ B       Y
  11 │ B       Y
  12 │ B       Y

When running the following code:

combine(groupby(df, [:group, :subgroup], nrow => :n))

Returns this error:

MethodError: no method matching groupby(::DataFrame, ::Vector{Symbol}, ::Pair{typeof(nrow), Symbol})
Closest candidates are:
  groupby(::AbstractDataFrame, ::Any; sort, skipmissing) at ~/.julia/packages/DataFrames/JZ7x5/src/groupeddataframe/groupeddataframe.jl:211

Stacktrace:
 [1] top-level scope
   @ In[51]:1
 [2] eval
   @ ./boot.jl:368 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1428

I am not sure why this error happens while running the same code. So I was wondering if anyone could explain why this happens and if this is the right way to count the number of rows per group in Julia?

CodePudding user response:

You've just got the brackets wrong:

combine(groupby(df, [:group, :subgroup]), nrow => :n)

Note the first closing parens before nrow - this is the call to groupby, and it is the result of this which you pass as the first argument to combine

CodePudding user response:

Besides the Nil's answer, you can also use the following syntax:

combine(groupby(df, [:group, :subgroup]), :group => (length) => :nrow)
# 4×3 DataFrame
#  Row │ group   subgroup  nrow
#      │ String  String    Int64
# ─────┼─────────────────────────
#    1 │ A       X             3
#    2 │ A       Y             3
#    3 │ B       X             2
#    4 │ B       Y             4

In the above, I'm counting nrows per :group and the subsequent :subgroup. The first row of the result tells me the number of samples (rows) that have the A group label along with the X subgroup label.

*If you think this can't be implied as a different syntax, please tell me; I'll delete this answer.

  • Related