Home > Software engineering >  Why do I get index value inside column value when I do pandas groupby?
Why do I get index value inside column value when I do pandas groupby?

Time:03-24

I have a data frame as follows:

import pandas as pd


df = pd.DataFrame()
df['year'] = ['2015','2015','2016','2016','2017','2017']
df['months'] = [2,4,4,2,3,5]
df['perc'] = ['25','35','55','75','34','38']

Which results in dataframe:

year  months perc
0  2015       2   25
1  2015       4   35
2  2016       4   55
3  2016       2   75
4  2017       3   34
5  2017       5   38

When I do a pandas groupby on year column, the resultant dataframe/pandasgroupby object has index of original DF and the year inside the year column.

Command used:

result = pd.DataFrame(df.groupby('year',as_index=False).apply(lambda x: x.nlargest(1, ['months'])))

Where the year column has index of original DF (the 1, 2, 5) with the year value:

year  months perc
0 1  2015       4   35
1 2  2016       4   55
2 5  2017       5   38

print(result['year']) gives:

0  1    2015
1  2    2016
2  5    2017
Name: year, dtype: object

Why Do I get index of original dataframe inside the year column and how to remove it?

CodePudding user response:

please try this:

result = result.reset_index(drop=True)

CodePudding user response:

I am not sure what is you expected output, but use group_keys=False as parameter to only keep the original index:

(df.groupby('year',as_index=False, group_keys=False)
   .apply(lambda x: x.nlargest(1, ['months']))
)

output:

   year  months perc
1  2015       4   35
2  2016       4   55
5  2017       5   38
  • Related