I have a data frame as follows:
import pandas as pd
df = pd.DataFrame()
df['year'] = ['2015','2015','2016','2016','2017','2017']
df['months'] = [2,4,4,2,3,5]
df['perc'] = ['25','35','55','75','34','38']
Which results in dataframe:
year months perc
0 2015 2 25
1 2015 4 35
2 2016 4 55
3 2016 2 75
4 2017 3 34
5 2017 5 38
When I do a pandas groupby on year column, the resultant dataframe/pandasgroupby object has index of original DF and the year inside the year column.
Command used:
result = pd.DataFrame(df.groupby('year',as_index=False).apply(lambda x: x.nlargest(1, ['months'])))
Where the year column has index of original DF (the 1, 2, 5) with the year value:
year months perc
0 1 2015 4 35
1 2 2016 4 55
2 5 2017 5 38
print(result['year']) gives:
0 1 2015
1 2 2016
2 5 2017
Name: year, dtype: object
Why Do I get index of original dataframe inside the year column and how to remove it?
CodePudding user response:
please try this:
result = result.reset_index(drop=True)
CodePudding user response:
I am not sure what is you expected output, but use group_keys=False
as parameter to only keep the original index:
(df.groupby('year',as_index=False, group_keys=False)
.apply(lambda x: x.nlargest(1, ['months']))
)
output:
year months perc
1 2015 4 35
2 2016 4 55
5 2017 5 38