I've got two tables, for example
Table A with fields A,B (sorted by A count): A,B
Column A | Column B |
---|---|
foo1 | a |
foo2 | b |
foo3 | a |
foo4 | d |
foo5 | c |
foo6 | a |
Table B (sorted by B count):
Column B |
---|
a |
b |
c |
d |
e |
I want to get a table like this (to get top 2 most popular A for each B):
Column A | Column B |
---|---|
foo1 | a |
foo3 | a |
foo2 | b |
foo4 | d |
foo5 | c |
I tried to do aggregate and groupby, but I have no ideas how to create this.
CodePudding user response:
IIUC use ordered Categorical
, then sorting by this column and get top2 values by GroupBy.head
:
df['Column B'] = pd.Categorical(df['Column B'],
ordered=True,
categories=df['Column B'].unique())
If order is necessary use from another DataFrame
use categories=df2['Column B']
:
df['Column B'] = pd.Categorical(df['Column B'],
ordered=True,
categories=df2['Column B'])
df = df.sort_values('Column B').groupby('Column B').head(2)
print (df)
Column A Column B
0 foo1 a
2 foo3 a
1 foo2 b
3 foo4 d
4 foo5 c