I see
df["col2"] = df["col1"].apply(len)
len(df["col1"])
My question is,
Why use "
len
" function without parenthesis in 1, but use it with parenthesis in 2?What is the difference between the two?
I see this kind of occasion a lot, where using a function with and without parenthesis. Can someone explain to me what exactly is going on?
Thanks.
CodePudding user response:
The first example that you mentioned(the above code) maps the function len to the target variable df["col1"]
df["col2"] = df["col1"].apply(len)
Whenever we have to map a function to any iterable object, the syntax needs the function to be without parenthesis. In your case, df["col1"] must be having elements whose length can be calculated. And it will return a Pandas Series will lengths of all the elements. Take the following example.
a = ["1", "2","3","4"]
z = list( map( int, a ) ) >> [1, 2, 3, 4]
Here, we mapped the builtin int function(which does typecasting), to the entire list.
The second example that you mentioned would give out the length of the df["col1"] series.
len(df["col1"])
It won't do any operations on the elements within that Series. Take the following example.
a = ["1", "2","3","4"]
z = len(a) >> 4
Since, on both the occasions, the function len was fed an iterable object, it didn't give any error. But, the outputs are completely different as I explained!
CodePudding user response:
In 1, the function len
is being passed to a method called apply
. That method presumably will apply the function len
along the first axis (probably returning something like a list of lengths). In 2, the function len
is being called directly, with an argument df["col2"]
, presumably to get the length of the data frame.
The use in 1 is sometimes called a "higher order function", but in principle it's just passing a function to another function for it to use.
CodePudding user response:
In the second case you are directly calling the len method and will get the result, i.e. how many rows are in col1 in the df.
In the first you are giving the reference to the len function to the apply function.
This is a shortcut for df["col2"] = df["col1"].apply(lambda x: len(x))
This version you use if you want to make the behavior of a method flexible by letting the user of the method hand in the function to influence some part of an algorithm. Like here in the case with the apply method. Depending of the conents in the column you want to fill the new column with something, and here it was decided to fill this with the lengths of the content of other column.
CodePudding user response:
len(s) will return the lenght of the s variable
len will return the function itslelf. So if I do a=len, then I can do a(s). Of course, it is not recommended to do such thing as a=len.
CodePudding user response:
Let's have a look at the documentation of DataFrame.apply
:
its first parameter is func: function
which is a function that we'll apply to each column or row of the DataFrame
. In your case this function is len()
.
Now let's see what happens when you pass len
as a parameter with parenthesis:
df.apply(len())
-----------------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_11920/3211016940.py in <module>
----> 1 df.apply(len())
TypeError: len() takes exactly one argument (0 given)
While this perfectly works when we use df.apply(len)
.
This is because your parameter must be a function
and the way Python uses to distinguish between functions and the return value of the call to a function is the use of parenthesis in the second case.