I have a pandas multi-index dataframe with a lot of data for each year and state in the US. I would like to remove the five lowest values in the column age
per year per state.
ID age ... BMI
Year State
2001.0 ALASKA 1194.0 97.0 ... 20.19880
ALASKA 1196.0 72.0 ... 27.78515
ALASKA 1198.0 17.0 ... 19.61245
ALASKA 1195.0 19.0 ... 26.81935
ALASKA 1190.0 45.0 ... 29.87964
...
2019.0 ALASKA 1621.0 94.0 ... 22.87767
ALASKA 1622.0 79.0 ... 29.80623
ALASKA 1624.0 24.0 ... 25.12029
ALASKA 1626.0 12.0 ... 20.15703
ALASKA 1641.0 63.0 ... 27.47901
How can I drop every 5 lowest values in the column age for every year and every state?
CodePudding user response:
I don't know if this is the most efficient, but a solution could be (df
is your DataFrame):
import pandas as pd
pd.concat([y.sort_values(by="age").iloc[5:,:] for _, y in df.groupby(df.index)])
CodePudding user response:
You could try:
df = df.reset_index()
df = df.drop(
df.groupby(["Year", "State"]).age.nsmallest(5).index.get_level_values(2)
).set_index(["Year", "State"], drop=True)