How do I print the cell values that cause pandas pandas.DataFrame.any to return True?-CodePudding

The code below tells if a dataframe Df3 cell has the same value as another dataframe cell within an array, dataframe_arrays. However, I want to print the cell value and the specific dataframe within dataframe_arrays that have the same value as Df3. Here is what I have tried -

import pandas as pd
dataframe_arrays = []
Df1 = pd.DataFrame({'IDs': ['Marc', 'Jake', 'Sam', 'Brad']})
dataframe_arrays.append(Df1)
Df2 = pd.DataFrame({'IDs': ['TIm', 'Tom', 'harry', 'joe', 'bill']})
dataframe_arrays.append(Df2)
Df3 = pd.DataFrame({'IDs': ['kob', 'ham', 'konard', 'jupyter', 'Marc']})
repeat = False
for i in dataframe_arrays:
  repeat = Df3.IDs.isin(i.IDs).any()
  if repeat:
    print("i = ", i)
    break

My objective is to compare my current dataframe column with columns belonging to another set of dataframes and identify which values are repeating.

CodePudding user response：

If your data is not that large, you can simply use nested loop with .iterrows() to go through row by row and dataframe by dataframe. Also, you can use globals() to get the variable name of the dataframe that contains the duplicate.

def get_var_name(variable):
    globals_dict = globals()

    return [var_name for var_name in globals_dict if globals_dict[var_name] is variable]

for index, row in Df3.iterrows():
    for i in range(len(dataframe_arrays)):
        if row['IDs'] in dataframe_arrays[i]['IDs'].values:
            print("{} is in {}".format(row['IDs'], get_var_name(dataframe_arrays[i])[0]))

output:

> Marc is in Df1

CodePudding user response：

To print the cell value and the specific dataframe within dataframe_arrays that have the same value as Df3, you can use the .loc method to select the cells in the dataframe that contain the repeated values. For example, you could use the following code:

import pandas as pd
dataframe_arrays = []
Df1 = pd.DataFrame({'IDs': ['Marc', 'Jake', 'Sam', 'Brad']})
dataframe_arrays.append(Df1)
Df2 = pd.DataFrame({'IDs': ['TIm', 'Tom', 'harry', 'joe', 'bill']})
dataframe_arrays.append(Df2)
Df3 = pd.DataFrame({'IDs': ['kob', 'ham', 'konard', 'jupyter', 'Marc']})
repeat = False
for i in dataframe_arrays:
  repeat = Df3.IDs.isin(i.IDs).any()
  if repeat:
    repeated_values = i.loc[i.IDs.isin(Df3.IDs)]
    print("Repeated values:")
    print(repeated_values)
    break

This code will loop through each dataframe in dataframe_arrays, and check if any of the values in Df3 appear in the current dataframe. If a repeated value is found, the code will use the .loc method to select the cells in the current dataframe that contain the repeated values, and print them to the screen.