Lookup 3 or more matching numbers in dataframe-CodePudding

I'm using python and having trouble finding a code that matches 3 or more numbers and return the Id number. For example, I would like to lookup these 6 numbers in one sheet ( 2,4,6,8,9,10 )

lookup 6 number above and find 3 or more numbers in this sheet below.

I would like my result to be something like this in another excel sheet but only return the ID only if it matches 3 or more numbers. If you look at id u it matches 2,4,6,8 and id 6 matches 6,8,9,10 and id 7 matches 2,6,8

CodePudding user response：

Here is what I think you are trying to achieve. I have made the assumption that you have imported both excel files and have named them df1 and df2 respectively. With df1 being the smaller dataframe containing the values to search. I created my own dataframes for testing purposes. I only recreated df2 up to ID=7.

Here is the code to build the tables:

import pandas as pd

cols = ['Num1','Num2','Num3','Num4','Num5','Num6']
df1 = pd.DataFrame([[2,4,6,8,9,10]], columns=cols)
df2 = pd.DataFrame([[1,1,2,4,5,6,8],
                   [2,5,6,20,22,23,34],
                   [3,8,12,13,34,45,46],
                   [4,9,10,14,29,32,33],
                   [5,1,22,13,23,33,35],
                   [6,1,6,7,8,9,10],
                   [7,0,2,3,5,6,8]], 
                   columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])

And now the code to find the IDs:

# convert the values in the first dataframe to a list
vals_to_find = df1.iloc[0].tolist()

# Print the values to find
print("Vals to find:", vals_to_find)

# Create an empty list to hold the matching IDs
matching_ids = []

# iterate through the big dataframe
for index, row in df2.iterrows():

    rowlist = row.tolist()       # convert the row to a list

    # keep the id for later, and extract the other values for evaluation
    id = rowlist[0]
    vals = rowlist[1:]
    
    # count the number of values in one list against another list
    counter = sum(elem in vals_to_find for elem in vals)

    # If the number of matches is greater than 2, then grab the ID
    if counter > 2:
        matching_ids.append({'ID': id})

# Print the matching IDs 
print('Matching IDS:', matching_ids)

# Convert to dataframe and export to excel spreadsheet
df3 = pd.DataFrame(matching_ids)
df3.to_excel("output.xlsx", index=False)

OUTPUT:

Vals to find: [2, 4, 6, 8, 9, 10]
Matching IDS: [{'ID': 1}, {'ID': 6}, {'ID': 7}]

And the output.xlsx: