Pandas vlookup like operation to a list-CodePudding

I am unable to properly explain my requirement, but I can show the expected result. I have a dataframe that looks like so:

Series1	Series2
1370307	1370306
927092	927091
925392	925391
925390	925389
2344089	2344088
1827855	1827854
1715793	1715792
2356467	2356466
1463264	1463263
1712684	1712683

actual dataframe size: 902811 rows × 2 columns

then another dataframe of unique values of Series2. This I've done using value counts.

df2 = df['Series2'].value_counts().rename_axis('Series2').to_frame('counts').reset_index()

Then I need a list of matching Series1 values for each Series2 value: The expected result is:

Series2	counts	Series1_List
2543113	6	[2543114, 2547568, 2559207, 2563778, 2564330, 2675803]
2557212	6	[2557213, 2557301, 2559192, 2576080, 2675693, 2712790]
2432032	5	[2432033, 2444169, 2490928, 2491392, 2528056]
2559269	5	[2559270, 2576222, 2588034, 2677710, 2713207]
2439554	5	[2439555, 2441882, 2442272, 2443590, 2443983]
2335180	5	[2335181, 2398282, 2527060, 2527321, 2565487]
2494111	4	[2494112, 2495321, 2526026, 2528492]
2559195	4	[2559196, 2570172, 2634537, 2675718]
2408775	4	[2408776, 2409117, 2563765, 2564320]
2408773	4	[2408774, 2409116, 2563764, 2564319]

I achieve this (although only for a subset of 50 rows) using the following code:

df2.loc[:50,'Series1_List'] = df2.loc[:50,'Series2'].apply(lambda x: df[df['Series2']==x]['Series1'].tolist())

If I do this for the whole dataframe it wouldn't complete even in 20 minutes. So the question is whether there is a faster and efficient method of achieving the result?

CodePudding user response：

IIUC, use:

df2 = (df.groupby('Series2', as_index=False)
         .agg(counts=('Series1', 'count'), Series1_List=('Series1', list))
       )