Home > other >  Why can't you calculate a summary statistic on a series with .apply?
Why can't you calculate a summary statistic on a series with .apply?

Time:09-06

I'm experimenting with the .apply() method and noticed that Python doesn't seem to allow using it to calculate a summary statistic on a series.

You can use .apply() to modify all values in a series, e.g.,

test = pd.Series([1,2,3])

t_output = test.apply(lambda a : a *2)

t_output

yielding

enter image description here

However, if I wanted to generate a summary statistic, e.g.,

t_output2 = test.apply(np.sum)

t_output2

I just get the original series values: enter image description here

I know I could instead write test.sum() and get the desired summary statistic. But I'm interested in better understanding the technical reason why .apply() doesn't seem to permit generating one for an isolated series.

I know that a series is size-immutable, but that shouldn't be consequential here because I'm reassigning the .apply() output to a new object. So any details on better understanding this would be appreciated!

CodePudding user response:

A dataframe is two-dimensional, thus you can apply some functionality along either dimension (i.e. on either rows or columns). As a series is one-dimensional, the functionality is applied on each element (i.e. value) of the series. Of course, if the entries of the series themselves have dimensions, then summation could be a useful function to apply, but if they are just numbers, applying summation is just the identity, as there is no dimension to summarize (i.e. sum) over.

  • Related