I have a DataFrame
which index has 2 levels: Name
, SubName
. What I would like to do is to truncate this DataFrame
in such a way that for each Name
-level index I will leave only top 3 rows. So for DataFrame
import pandas as pd
df = pd.DataFrame()
df["Name"] = ["Name1", "Name1", "Name1", "Name1"]
df["SubName"] = ["SubName1", "SubName2", "SubName3", "SubName4"]
df["Value"] = [1,2,3,4]
df.set_index(["Name", "SubName"], inplace = True)
df
I would like to drop the last (fourth) row, and similarly if a DataFrame has more Name
-level indices.
CodePudding user response:
Use groupby.head
with the level Name
as grouper:
out = df.groupby(level='Name').head(3)
output:
Value
Name SubName
Name1 SubName1 1
SubName2 2
SubName3 3
More complex example:
names = ['Name1', 'Name2', 'Name3']
subnames = ['SubName1', 'SubName2', 'SubName3', 'SubName4']
idx = pd.MultiIndex.from_product([names, subnames], names=['Name', 'SubName'])
df = pd.DataFrame({'Value': range(len(idx))}, index=idx)
out = df.groupby(level='Name').head(3)
output
Value
Name SubName
Name1 SubName1 0
SubName2 1
SubName3 2
Name2 SubName1 4
SubName2 5
SubName3 6
Name3 SubName1 8
SubName2 9
SubName3 10