I'm experimenting with the .apply() method and noticed that Python doesn't seem to allow using it to calculate a summary statistic on a series.
You can use .apply() to modify all values in a series, e.g.,
test = pd.Series([1,2,3])
t_output = test.apply(lambda a : a *2)
t_output
yielding
However, if I wanted to generate a summary statistic, e.g.,
t_output2 = test.apply(np.sum)
t_output2
I just get the original series values:
I know I could instead write test.sum() and get the desired summary statistic. But I'm interested in better understanding the technical reason why .apply() doesn't seem to permit generating one for an isolated series.
I know that a series is size-immutable, but that shouldn't be consequential here because I'm reassigning the .apply() output to a new object. So any details on better understanding this would be appreciated!
CodePudding user response:
A dataframe is two-dimensional, thus you can apply some functionality along either dimension (i.e. on either rows or columns). As a series is one-dimensional, the functionality is applied on each element (i.e. value) of the series. Of course, if the entries of the series themselves have dimensions, then summation could be a useful function to apply, but if they are just numbers, applying summation is just the identity, as there is no dimension to summarize (i.e. sum) over.