Pair each value in a row with other rows in pandas dataframe-CodePudding

I have a dataframe of 500 rows sorted as follows:

Col1    Val1
asd     0.27
pqer    0.37
psdf    0.54

I am trying to pair each value of Col1 with another row in Col1 who have higher Val1 to get the following list:

[['asd', 'pqer'], ['asd','psdf'],['pqer','psdf']]

I am not sure how to go about this.

CodePudding user response：

Try self merge and filtering:

df.merge(df, how='cross')\
  .query('Val1_x < Val1_y')[['Col1_x','Col1_y']]\
  .to_numpy().tolist()

Output:

[['asd', 'pqer'], ['asd', 'psdf'], ['pqer', 'psdf']]

CodePudding user response：

A binary search could be more efficient than a cartesian join (dependent on the data size); conditional_join from pyjanitor can help with this:

# pip install pyjanitor
import pandas as pd
import janitor


(df.conditional_join(df, ('Val1', 'Val1', '<'))
   .loc(axis=1)[:, 'Col1']
   .to_numpy()
   .tolist()
)

 [['asd', 'pqer'], ['asd', 'psdf'], ['pqer', 'psdf']]