Home > Software design >  Finding Matching Values Between Multiple Dataframes
Finding Matching Values Between Multiple Dataframes

Time:09-22

So i wanna create a dataframe based on the matching values between 11 others with the reference dataframe. And the 11 dataframes has a column named 'Serial Number' which i wanna compare with the Serial ID column on the reference dataframe.

Due to the fact that i didn't thought of way to loop all 12 excel files into separated variables. I just writed the below code.

Imported all libraries needed

import pandas as pd

from matplotlib import pyplot as plt

from google.colab import drive

drive.mount('/content/drive')

Specified the path in which all the excel files are stored

directory = '/content/drive/MyDrive/Colab Notebooks/Ursa project'

And did filename = pd.read_excel('path') for all 12 files

Now that i created all dataframe objects i need to find which rows of these 11 dataframes has matching values between the Serial Number column with the Serial ID column in the reference dataframe.

My failed attempt was to do the following for each dataframe:

for i in reference_df['Serial ID']:

  df_matches = df1[df1['Serial number'] == i]

df_matches

P.S.: i'm using Colab

CodePudding user response:

I think one of the problems i'm stumbling upon would be the fact that although there's matching between the 'Serial Number' columns and the Serial ID column, those matches aren't exact matches. So let's say if 1001 is present on both columns, but in the Serial Number cell we have "s/n:1001" and in the Serial ID cell we have just "1001"

The code won't understand it as a match, so how do i go about to filter these little differences?

CodePudding user response:

If what you are trying to match is a part of a string, you can use df.Series.str.contains.

df_matches = []
for i in reference_df['Serial ID']:
    df_matches.append(df1[df1['Serial number'].str.contains(i)])
  • Related