Home > Net >  Using. np.isclose Function to compare two dataframes
Using. np.isclose Function to compare two dataframes

Time:12-14

I have two dataframe with the same size and same variables.

df1 size: (50, 3) 
df2 size: (50, 3)

In my code, I'm using np.isclose function to return a boolean array where two arrays are element-wise equal within a tolerance as follows:

feature_1 = np.isclose(df1["SepalLengthCm"], df2["SepalLengthCm"], atol=10)

feature_2 = np.isclose(df1["SepalWidthCm"], df2["SepalWidthCm"], atol=20)

feature_3 = np.isclose(df1["PetalLengthCm"], df2["PetalLengthCm"], atol=30)

The code is working ok without any error. But the issue for me is that I want to make this process more general and automatic. In other word, it should be working with any other datasets.

The important thing is that I dont want to specify the columns name in the code. So I want use for loop to automatically iterate over each column and doing the same thing.

Instead of having three codes (lines) for each column like feature_1 feature_2 feature_3, I want to write one code (line) to do the same job for any number of columns. Something like this:

feature = np.isclose(df1[columnn], df2[columnn], atol=i)

the parameter for atol should be also predefined in advance, for example i = [10, 20, 30]

CodePudding user response:

You could define a function containing the loop you want, and call it as a one-liner:

def compare_isclose(df1, df2, atol_list):
    feature_list = []
    for i, col in enumerate(df1.columns):
        feature_list.append(np.isclose(df1[col], df2[col], atol=atol_list[i]))
    feature_df = pd.DataFrame.from_records(feature_list).T
    return feature_df

The function can be used this way:

atol_list = [10,20,30]
feature_df = compare_isclose(df1, df2, atol_list)

It only works under the assumption that the number of elements in atol_list equals the number of columns in the dataframe, and that the column names of the 2 dataframes are identical.

  • Related