Home > Software design >  How to leave only three top rows for each index level in pandas DataFrame
How to leave only three top rows for each index level in pandas DataFrame

Time:11-10

I have a DataFrame which index has 2 levels: Name, SubName. What I would like to do is to truncate this DataFrame in such a way that for each Name-level index I will leave only top 3 rows. So for DataFrame

import pandas as pd

df = pd.DataFrame()
df["Name"] = ["Name1", "Name1", "Name1", "Name1"]
df["SubName"] = ["SubName1", "SubName2", "SubName3", "SubName4"]
df["Value"] = [1,2,3,4]

df.set_index(["Name", "SubName"], inplace = True)

df

I would like to drop the last (fourth) row, and similarly if a DataFrame has more Name-level indices.

CodePudding user response:

Use groupby.head with the level Name as grouper:

out = df.groupby(level='Name').head(3)

output:

                Value
Name  SubName        
Name1 SubName1      1
      SubName2      2
      SubName3      3

More complex example:

names = ['Name1', 'Name2', 'Name3']
subnames = ['SubName1', 'SubName2', 'SubName3', 'SubName4']

idx = pd.MultiIndex.from_product([names, subnames], names=['Name', 'SubName'])
df = pd.DataFrame({'Value': range(len(idx))}, index=idx)

out = df.groupby(level='Name').head(3)

output

                Value
Name  SubName        
Name1 SubName1      0
      SubName2      1
      SubName3      2
Name2 SubName1      4
      SubName2      5
      SubName3      6
Name3 SubName1      8
      SubName2      9
      SubName3     10
  • Related