I have a dataframe of 500 rows sorted as follows:
Col1 Val1
asd 0.27
pqer 0.37
psdf 0.54
I am trying to pair each value of Col1
with another row in Col1
who have higher Val1
to get the following list:
[['asd', 'pqer'], ['asd','psdf'],['pqer','psdf']]
I am not sure how to go about this.
CodePudding user response:
Try self merge and filtering:
df.merge(df, how='cross')\
.query('Val1_x < Val1_y')[['Col1_x','Col1_y']]\
.to_numpy().tolist()
Output:
[['asd', 'pqer'], ['asd', 'psdf'], ['pqer', 'psdf']]
CodePudding user response:
A binary search could be more efficient than a cartesian join (dependent on the data size); conditional_join from pyjanitor can help with this:
# pip install pyjanitor
import pandas as pd
import janitor
(df.conditional_join(df, ('Val1', 'Val1', '<'))
.loc(axis=1)[:, 'Col1']
.to_numpy()
.tolist()
)
[['asd', 'pqer'], ['asd', 'psdf'], ['pqer', 'psdf']]