Just running a simple for-loop on a list of dataframes, however trying to add an IF clause... and it keeps erroring out.
df_list = [df1, df2, df3]
for df in df_list:
if df in [df1, df2]:
x = 1
else:
x = 2
.
.
.
ValueError: Can only compare identically-labeled DataFrame objects
Above is a simplified version of what I'm attempting. Can anyone tell me why this isn't working and a fix?
CodePudding user response:
You could use DataFrame.equals
with any
instead:
df_list = [df1, df2, df3]
for df in df_list:
if any(df.equals(y) for y in [df1, df2]):
x = 1
else:
x = 2
CodePudding user response:
You could use a better container and reference them by labels.
Equality checks for large DataFrames with object types can become slow, >> seconds, but it will take ~ns to check if the label is in a list.
dfs = {'df1': df1, 'df2': df2, 'df3': df3}
for label, df in dfs.items():
if label in ['df1', 'df2']:
x = 1
else:
x = 2
CodePudding user response:
You need to use df.equals()
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html
df_list = [df1, df2, df3]
for df in df_list:
if df.equals(df1) or df.equals(df2):
# blah blah
CodePudding user response:
The following link might help: Pandas "Can only compare identically-labeled DataFrame objects" error
According to this, the data frames being compared with ==
should have the same columns and index otherwise it gives the error.
Alternatively, you can compare the data frames using dataframe.equals
method. Please refer to the documentation below: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html
CodePudding user response:
Do NOT use .equals()
here!
It's unnecessary and slowing down you program, use id()
instead:
df_list = [df1, df2, df3]
for df in df_list:
if id(df) in [id(df1), id(df2)]:
x = 1
else:
x = 2