Home > Software engineering >  Standard deviation of dataframe?
Standard deviation of dataframe?

Time:10-28

Since dataframe.std() is deprecated, we should use groupby now: https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.std.html

However, for finding a simple standard deviation of this

dataframe['number'].std(),

the groupby function seems to be an unnecessarily long command to me. "dataframe" has a column of 'number' ranging from 1 to 100.

What would the above line look like using groupby?

CodePudding user response:

I think there is a misunderstanding of the docs.

What pandas is deprecating is specifically the level parameter in favor of its groupby counterpart (the link you shared). Nowhere it says pandas.Series.std is deprecated as a whole:

level: int or level name, default None If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.

Deprecated since version 1.3.0: The level keyword is deprecated. Use groupby instead.

and:

numeric_only: bool, default None Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Deprecated since version 1.5.0: Specifying numeric_only=None is deprecated. The default value will be False in a future version of pandas.

Given the line of code you propose, I see no reason for changing it. Keep using:

df['col'].std()
  • Related