Home > Enterprise >  Drop lowest five values of multiindex pandas dataframe per index
Drop lowest five values of multiindex pandas dataframe per index

Time:12-02

I have a pandas multi-index dataframe with a lot of data for each year and state in the US. I would like to remove the five lowest values in the column age per year per state.

                     ID    age   ...    BMI     
Year   State                                                          
2001.0 ALASKA      1194.0  97.0  ...  20.19880
       ALASKA      1196.0  72.0  ...  27.78515
       ALASKA      1198.0  17.0  ...  19.61245
       ALASKA      1195.0  19.0  ...  26.81935
       ALASKA      1190.0  45.0  ...  29.87964
                                 ...
2019.0 ALASKA      1621.0  94.0  ...  22.87767
       ALASKA      1622.0  79.0  ...  29.80623
       ALASKA      1624.0  24.0  ...  25.12029
       ALASKA      1626.0  12.0  ...  20.15703
       ALASKA      1641.0  63.0  ...  27.47901

How can I drop every 5 lowest values in the column age for every year and every state?

CodePudding user response:

I don't know if this is the most efficient, but a solution could be (df is your DataFrame):

import pandas as pd

pd.concat([y.sort_values(by="age").iloc[5:,:] for _, y in df.groupby(df.index)])

CodePudding user response:

You could try:

df = df.reset_index()
df = df.drop(
    df.groupby(["Year", "State"]).age.nsmallest(5).index.get_level_values(2)
).set_index(["Year", "State"], drop=True)
  • Related