I have two dataframe with the same size and same variables.
df1 size: (50, 3)
df2 size: (50, 3)
In my code, I'm using np.isclose
function to return a boolean array where two arrays are element-wise equal within a tolerance as follows:
feature_1 = np.isclose(df1["SepalLengthCm"], df2["SepalLengthCm"], atol=10)
feature_2 = np.isclose(df1["SepalWidthCm"], df2["SepalWidthCm"], atol=20)
feature_3 = np.isclose(df1["PetalLengthCm"], df2["PetalLengthCm"], atol=30)
The code is working ok without any error. But the issue for me is that I want to make this process more general and automatic. In other word, it should be working with any other datasets.
The important thing is that I dont want to specify the columns name in the code. So I want use for
loop to automatically iterate over each column and doing the same thing.
Instead of having three codes (lines) for each column like feature_1
feature_2
feature_3
, I want to write one code (line) to do the same job for any number of columns. Something like this:
feature = np.isclose(df1[columnn], df2[columnn], atol=i)
the parameter for atol
should be also predefined in advance, for example i = [10, 20, 30]
CodePudding user response:
You could define a function containing the loop you want, and call it as a one-liner:
def compare_isclose(df1, df2, atol_list):
feature_list = []
for i, col in enumerate(df1.columns):
feature_list.append(np.isclose(df1[col], df2[col], atol=atol_list[i]))
feature_df = pd.DataFrame.from_records(feature_list).T
return feature_df
The function can be used this way:
atol_list = [10,20,30]
feature_df = compare_isclose(df1, df2, atol_list)
It only works under the assumption that the number of elements in atol_list equals the number of columns in the dataframe, and that the column names of the 2 dataframes are identical.