How to compare two columns in a grouped pandas dataframe?-CodePudding

I am unable to compare two columns inside a grouped pandas dataframe. I used groupby method to group the fields with respect to two columns

I am required to get the list of fields that are not matching with the actual output.

file_name | page_no | field_name | value | predicted_value | actual_value
-------------------------------------------------------------------------
A            1        a            1          zx             zx
A            2        b            0          xt             xi
B            1        a            1          qw             qw
B            2        b            0          xr             xe

desired output:

Because b is the only field that is causing the mismatch between the two columns

The following is my code:

groups = df1.groupby(['file_name', 'page_no'])
a = pd.DataFrame(columns = ['file_name', 'page_no', 'value'])
for name, group in groups:
    lst = []
    if (group[group['predicted_value']] != group[group['actual_value']]):
        lst = lst.append(group[group['field_name']])
    print(lst)

I am required to get the list of fields that are not matching with the actual output. Here, I'm trying to store them in a list but I am getting some key error.

The error is as follows:

KeyError: "None of [Index(['A', '1234'')] are in the [columns]"

CodePudding user response：

Here is solution for test columns outside groups:

df1 = df[df['predicted_value'] != df['actual_value']] 

s = df.loc[df['predicted_value'] != df['actual_value'], 'field_name']

L = s.tolist()

CodePudding user response：

Does this solve your problem ?


# Create a new dataframe retrieving only non-matching values

df1=df[df['predicted_value']!=df['actual_value']]

# Store 'field_name' column in a list format

lst=list(df1['field_name'])

print(lst)