I have a data frame like below. It has two columns column1,column2
from these two columns I want to filter few values(combinations of two lists) and get the index. Though I wrote the logic for it. It is will be too slow for filtering from a larger data frame. Is there any faster way to filter the data and get the list of indexes?
Data frame:-
import pandas as pd
d = {'col1': [11, 20,90,80,30], 'col2': [30, 40,50,60,90]}
df = pd.DataFrame(data=d)
print(df)
col1 col2
0 11 30
1 20 40
2 90 50
3 80 60
4 30 90
l1=[11,90,30]
l2=[30,50,90]
final_result=[]
for i,j in zip(l1,l2):
res=df[(df['col1']==i) & (df['col2']==j)]
final_result.append(res.index[0])
print(final_result)
[0, 2, 4]
CodePudding user response:
You can just use underlying numpy array and create the boolean indexing:
mask=(df[['col1', 'col2']].values[:,None]==np.vstack([l1,l2]).T).all(-1).any(1)
# mask
# array([ True, False, True, False, True])
df.index[mask]
# prints
# Int64Index([0, 2, 4], dtype='int64')
CodePudding user response:
you can use:
condition_1=df['col1'].astype(str).str.contains('|'.join(map(str, l1)))
condition_2=df['col2'].astype(str).str.contains('|'.join(map(str, l2)))
final_result=df.loc[ condition_1 & condition_2 ].index.to_list()
CodePudding user response:
here is one way to do it. Merging the two DF and filtering where value exists in both DF
# create a DF of the list you like to match with
df2=pd.DataFrame({'col1': l1, 'col2': l2})
# merge the two DF
df3=df.merge(df2, how='left',
on=['col1', 'col2'], indicator='foundIn')
# filterout rows that are in both
out=df3[df3['foundIn'].eq('both')].index.to_list()
out
[0, 2, 4]