beginner_question = 'When do we add arguments to functions?'-CodePudding

I was recently practicing some Python and I came onto a roadblock where I couldn't make my agg() to work, I later found out that it was because I didn't have to call the functions.

My question here is: I'd like somebody to please explain what are we exactly doing when we write () at the end of the function and what's the difference between doing it and not doing it

EDIT: THIS CODE IS EXAMPLE CODE, IM NOT LOOKING FOR AN ANSWER ON THIS CODE. I'M LOOKING FOR AN ANSWER ON THE CONCEPT OF CALLING OR NOT CALLING A FUNCTION AND HOW DOES THAT WORK.

What I was using which returned error: 'no a specified' (no argument)

sales_stats = sales.groupby('type')['weekly_sales'].agg([np.min(),np.max(),np.median(),np.mean()])

Correct code:

For each store type, aggregate weekly_sales: get min, max, mean, and median

sales_stats = sales.groupby('type')['weekly_sales'].agg([np.min,np.max,np.median,np.mean])

CodePudding user response：

sales.groupby('type')['weekly_sales'].agg([np.min,...]

sales is a Pandas dataframe, groupby('type') is a method call that returns GroupBy object, which in turn has a agg method.

Looking up its docs:

https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.core.groupby.DataFrameGroupBy.agg.html

According to that the first argument of agg is a

func : function, string, dictionary, or list of string/functions

In Python, functions are 'first class objects', that is, they can be passed as arguments just like numbers and lists, and can be put in a list as well.

np.max is a function (in the numpy module). [np.max, np.min] is a list of functions.

np.max is the function:

In [2]: np.max
Out[2]: <function numpy.amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)>

np.max(...) is a calling of the function, and produces something else, not the function itself. In this case it returns a number:

In [3]: np.max(np.array([1,2,3]))
Out[3]: 3

agg wants the function, not the number. agg will take care of calling np.max with arrays (or lists or Series) from the group.

Note that just adding () to a function may not do anything useful. It may even raise an error.

So you question is in part basic Python - the difference between a function and calling the function. But also a pandas and numpy question. And as such it requires reading the respective function/method documentation.

Note that the agg docs specifies what the function itself must accept.

Take the sample frame from the agg docs:

It shows providing agg with a string:

In [9]: df.groupby('A').agg('min')
Out[9]: 
   B         C
A             
1  1 -1.589447
2  3 -0.997238

agg recognizes a specific set of strings, which it converts into function calls. Equivalently we can pass a function:

In [10]: df.groupby('A').agg(np.min)
Out[10]: 
   B         C
A             
1  1 -1.589447
2  3 -0.997238

But when we use np.min() as you do, we get an error:

In [11]: df.groupby('A').agg(np.min())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 df.groupby('A').agg(np.min())

File <__array_function__ internals>:4, in amin(*args, **kwargs)

TypeError: _amin_dispatcher() missing 1 required positional argument: 'a'

You summarized the error as " returned error: 'no a specified' (no argument)". It is not a good idea to do that on SO. You should read the error in full, and show it in full. The traceback tells us that the problem is with the np.min() step. It didn't get as far as calling agg.

read the traceback
read the docs