Home > database >  Find the standard deviation with index and dataframe
Find the standard deviation with index and dataframe

Time:02-27

I have a Dataframe df containing information about people. I need to compute the standard deviation of the weight of people whose name starts with N. This is my code to create the Dataframe:

# 1. Here we import pandas
import pandas as pd
# 2. Here we import numpy
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'Age':[18, 21, 28, 19, 23, 22, 18, 24, 25, 20],
                   'Hair colour':['Blonde', 'Brown', 'Black', 'Blonde', 'Blonde', 'Black','Brown', 'Brown', 'Black', 'Black'],
                   'Length (in cm)':np.random.normal(175, 10, 10).round(1),
                   'Weight (in kg)':np.random.normal(70, 5, 10).round(1)},
                index = ['Leon', 'Mirta', 'Nathan', 'Linda', 'Bandar', 'Violeta', 'Noah', 'Niji', 'Lucy', 'Mark'],)

I should get a single number as a result.

Firstly, I attempted to use the function df.loc, like so:

# 1. Here we import numpy
import numpy as np
# 2. Here we import pandas
import pandas as pd
ans_4 = df.loc[pd.Series(df.index).str.startswith('N'), 'Weight (in kg)'].std()

However, I always get this IndexingError:

---------------------------------------------------------------------------
    IndexingError                             Traceback (most recent call last)
    ~\AppData\Local\Temp/ipykernel_21692/106038441.py in <module>
          3 # 2. Here we import pandas
          4 import pandas as pd
    ----> 5 ans_4 = df.loc[pd.Series(df.index).str.startswith('N'), 'Weight (in kg)'].std()
    
    ~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
        923                 with suppress(KeyError, IndexError):
        924                     return self.obj._get_value(*key, takeable=self._takeable)
    --> 925             return self._getitem_tuple(key)
        926         else:
        927             # we by definition only have the 0th axis
    
    ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
       1107             return self._multi_take(tup)
       1108 
    -> 1109         return self._getitem_tuple_same_dim(tup)
       1110 
       1111     def _get_label(self, label, axis: int):
    
    ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple_same_dim(self, tup)
        804                 continue
        805 
    --> 806             retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
        807             # We should never have retval.ndim < self.ndim, as that should
        808             #  be handled by the _getitem_lowerdim call above.
    
    ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
       1142             return self._get_slice_axis(key, axis=axis)
       1143         elif com.is_bool_indexer(key):
    -> 1144             return self._getbool_axis(key, axis=axis)
       1145         elif is_list_like_indexer(key):
       1146 
    
    ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getbool_axis(self, key, axis)
        946         # caller is responsible for ensuring non-None axis
        947         labels = self.obj._get_axis(axis)
    --> 948         key = check_bool_indexer(labels, key)
        949         inds = key.nonzero()[0]
        950         return self.obj._take_with_is_copy(inds, axis=axis)
    
    ~\anaconda3\lib\site-packages\pandas\core\indexing.py in check_bool_indexer(index, key)
       2386         mask = isna(result._values)
       2387         if mask.any():
    -> 2388             raise IndexingError(
       2389                 "Unalignable boolean Series provided as "
       2390                 "indexer (index of the boolean Series and of "
    
    IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
```

What do I need to do to fix the code?

CodePudding user response:

You are almost there. Casting df.index to pd.Series is unnecessary and leads to an error as indices get misaligned form the original df. Try:

df.loc[df.index.str.startswith('N'),'Weight (in kg)'].std()

output: 4.261846235299126

  • Related