Home > Blockchain >  Pandas vlookup like operation to a list
Pandas vlookup like operation to a list

Time:06-17

I am unable to properly explain my requirement, but I can show the expected result. I have a dataframe that looks like so:

Series1 Series2
1370307 1370306
927092 927091
925392 925391
925390 925389
2344089 2344088
1827855 1827854
1715793 1715792
2356467 2356466
1463264 1463263
1712684 1712683

actual dataframe size: 902811 rows × 2 columns

then another dataframe of unique values of Series2. This I've done using value counts.

df2 = df['Series2'].value_counts().rename_axis('Series2').to_frame('counts').reset_index()

Then I need a list of matching Series1 values for each Series2 value: The expected result is:

Series2 counts Series1_List
2543113 6 [2543114, 2547568, 2559207, 2563778, 2564330, 2675803]
2557212 6 [2557213, 2557301, 2559192, 2576080, 2675693, 2712790]
2432032 5 [2432033, 2444169, 2490928, 2491392, 2528056]
2559269 5 [2559270, 2576222, 2588034, 2677710, 2713207]
2439554 5 [2439555, 2441882, 2442272, 2443590, 2443983]
2335180 5 [2335181, 2398282, 2527060, 2527321, 2565487]
2494111 4 [2494112, 2495321, 2526026, 2528492]
2559195 4 [2559196, 2570172, 2634537, 2675718]
2408775 4 [2408776, 2409117, 2563765, 2564320]
2408773 4 [2408774, 2409116, 2563764, 2564319]

I achieve this (although only for a subset of 50 rows) using the following code:

df2.loc[:50,'Series1_List'] = df2.loc[:50,'Series2'].apply(lambda x: df[df['Series2']==x]['Series1'].tolist())

If I do this for the whole dataframe it wouldn't complete even in 20 minutes. So the question is whether there is a faster and efficient method of achieving the result?

CodePudding user response:

IIUC, use:

df2 = (df.groupby('Series2', as_index=False)
         .agg(counts=('Series1', 'count'), Series1_List=('Series1', list))
       )
  • Related