Home > OS >  Pandas window aggregation two sorted tables
Pandas window aggregation two sorted tables

Time:12-03

I've got two tables, for example

Table A with fields A,B (sorted by A count): A,B

Column A Column B
foo1 a
foo2 b
foo3 a
foo4 d
foo5 c
foo6 a

Table B (sorted by B count):

Column B
a
b
c
d
e

I want to get a table like this (to get top 2 most popular A for each B):

Column A Column B
foo1 a
foo3 a
foo2 b
foo4 d
foo5 c

I tried to do aggregate and groupby, but I have no ideas how to create this.

CodePudding user response:

IIUC use ordered Categorical, then sorting by this column and get top2 values by GroupBy.head:

df['Column B'] = pd.Categorical(df['Column B'], 
                                ordered=True, 
                                categories=df['Column B'].unique())

If order is necessary use from another DataFrame use categories=df2['Column B']:

df['Column B'] = pd.Categorical(df['Column B'], 
                                ordered=True, 
                                categories=df2['Column B'])

df = df.sort_values('Column B').groupby('Column B').head(2)
print (df)
  Column A Column B
0     foo1        a
2     foo3        a
1     foo2        b
3     foo4        d
4     foo5        c
  • Related