Looping through dataframe list with IF Statement-CodePudding

Just running a simple for-loop on a list of dataframes, however trying to add an IF clause... and it keeps erroring out.

df_list = [df1, df2, df3]
for df in df_list:
   if df in [df1, df2]:
      x = 1
   else:
      x = 2
.
.
.
ValueError: Can only compare identically-labeled DataFrame objects

Above is a simplified version of what I'm attempting. Can anyone tell me why this isn't working and a fix?

CodePudding user response：

You could use DataFrame.equals with any instead:

df_list = [df1, df2, df3]
for df in df_list:
    if any(df.equals(y) for y in [df1, df2]):
        x = 1
    else:
        x = 2

CodePudding user response：

You could use a better container and reference them by labels.

Equality checks for large DataFrames with object types can become slow, >> seconds, but it will take ~ns to check if the label is in a list.

dfs = {'df1': df1, 'df2': df2, 'df3': df3}
for label, df in dfs.items():
    if label in ['df1', 'df2']:
        x = 1
    else:
        x = 2

CodePudding user response：

You need to use df.equals()

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html

df_list = [df1, df2, df3]
for df in df_list:
   if df.equals(df1) or df.equals(df2):
      # blah blah

CodePudding user response：

The following link might help: Pandas "Can only compare identically-labeled DataFrame objects" error

According to this, the data frames being compared with == should have the same columns and index otherwise it gives the error.

Alternatively, you can compare the data frames using dataframe.equals method. Please refer to the documentation below: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html

CodePudding user response：

Do NOT use .equals() here!

It's unnecessary and slowing down you program, use id() instead:

df_list = [df1, df2, df3]
for df in df_list:
   if id(df) in [id(df1), id(df2)]: 
      x = 1
   else:
      x = 2