Home > Blockchain >  sklarn - Which model to find pairs
sklarn - Which model to find pairs

Time:02-20

I am working on a project and I want to check if 2 pcs of data fit together.

My idea is to use python and sklearn - I would need to predict if list A (100 entries) and list B (also 100 entries) fit together and the model should tell me how likely they will fit together.

I am pretty new to ML and I am actually not sure what model(s) would be most likely the best option(s) to try and how to structure the data for learning.

Would it be better to have 200 inputs which map then to a 0 (don't fit) or a 1 (fit) or would it be better to have 100 inputs mapping to 100 outputs.

But in case of 100 inputs and 100 outputs would I (at least to my understanding) try to predict the 2nd half of the pair based on the 1st half. I could check then how similiar would be each of the possible candidates to the prediction and select one based on that. But I am not sure if that is a good approach...

Basically I want to throw on the model e.g. 100k pcs and it should find the 50k matching pairs.

CodePudding user response:

As you are using it for ML I assume you are using pandas. You need to merge data frames. like following

frames = [df1, df2]

Detailed example:

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
    index=[0, 1, 2, 3],
)
df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["C4", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    },
    index=[4, 5, 6, 7],
)

then add like the following

frames = [df1, df2]

you can store the frames in a separate data frame (df)

result = pd.concat(frames)

output will be like following enter image description here

CodePudding user response:

OK that's good. I was pretty sure that's the way to go so I generated over night a CSV-file which look like that:

A0;A1;A2;A3;A4;A5;A6;A7;1
A0;A1;A2;A3;B0;B1;B2;B3;0
B0;B1;B2;B3;A4;A5;A6;A7;0
B0;B1;B2;B3;B4;B5;B6;B7;1
etc.

But what model(s) would be best to try first.

  • Related