Home > Back-end >  Fastest way of filter index values based on list values from multiple columns in Pandas Dataframe?
Fastest way of filter index values based on list values from multiple columns in Pandas Dataframe?

Time:11-07

I have a data frame like below. It has two columns column1,column2 from these two columns I want to filter few values(combinations of two lists) and get the index. Though I wrote the logic for it. It is will be too slow for filtering from a larger data frame. Is there any faster way to filter the data and get the list of indexes?

Data frame:-

import pandas as pd
d = {'col1': [11, 20,90,80,30], 'col2': [30, 40,50,60,90]}
df = pd.DataFrame(data=d)
print(df)
   col1 col2
0   11  30
1   20  40
2   90  50
3   80  60
4   30  90

l1=[11,90,30]
l2=[30,50,90]
final_result=[]
for i,j in zip(l1,l2):
    res=df[(df['col1']==i) & (df['col2']==j)]
    final_result.append(res.index[0])
print(final_result)

[0, 2, 4]

CodePudding user response:

You can just use underlying numpy array and create the boolean indexing:

mask=(df[['col1', 'col2']].values[:,None]==np.vstack([l1,l2]).T).all(-1).any(1)
# mask
# array([ True, False,  True, False,  True])

df.index[mask]
# prints
# Int64Index([0, 2, 4], dtype='int64')

CodePudding user response:

you can use:

condition_1=df['col1'].astype(str).str.contains('|'.join(map(str, l1)))
condition_2=df['col2'].astype(str).str.contains('|'.join(map(str, l2)))
final_result=df.loc[ condition_1 & condition_2 ].index.to_list()

CodePudding user response:

here is one way to do it. Merging the two DF and filtering where value exists in both DF

# create a DF of the list you like to match with
df2=pd.DataFrame({'col1': l1, 'col2': l2})

# merge the two DF
df3=df.merge(df2, how='left',
        on=['col1', 'col2'], indicator='foundIn')

# filterout rows that are in both
out=df3[df3['foundIn'].eq('both')].index.to_list()
out
[0, 2, 4]
  • Related