I have the following dataframe:
Country Sector
AU Semiconductor
AU Telecom
AU Banking
AU Telecom
AU Semiconductor
AU Telecom
SE Telecom
SE Semiconductor
SE Semiconductor
SE Semiconductor
SE Banking
SE Telecom
With following code I have created multi-index with the number of instances for each type of sector by country:
df.groupby(["Country","Sector"]).size()
The result is:
Country Sector
AU Banking 1
Semiconductor 2
Telecom 3
SE Banking 1
Semiconductor 3
Telecom 2
What I want to do is to sort the values by group and only show the top 2 sectors and add the rest of the sectors togheter in "Other":
AU Telecom 3
Semiconductor 2
Other 1
SE Semiconductor 3
Telecom 2
Other 1
CodePudding user response:
You can use rank
to
rank = df.groupby('Country')['Sector'].rank(method='dense', ascending=False)
out = (df.assign(Sector=df['Sector'].where(rank < 3, other='Other'))
.value_counts(['Country', 'Sector'])
.rename('Count').reset_index()
.sort_values(['Country', 'Count'], ascending=[True, False], ignore_index=True))
Output
>>> out
Country Sector Count
0 AU Telecom 3
1 AU Semiconductor 2
2 AU Other 1
3 SE Semiconductor 3
4 SE Telecom 2
5 SE Other 1
CodePudding user response:
You can do it step by step
s = df.value_counts().sort_values(ascending=False)
s1 = s.groupby(level = 0).head(2)
s2 = s.drop(s1.index,axis=0)
s2 = s2.groupby(level=0).sum()
s2.index = pd.MultiIndex.from_product([s2.index,['Other']])
out = pd.concat([s1,s2]).sort_index(level=0)
Out[391]:
Country Sector
AU Other 1
Semiconductor 2
Telecom 3
SE Other 1
Semiconductor 3
Telecom 2
dtype: int64