I am unable to compare two columns inside a grouped pandas dataframe. I used groupby method to group the fields with respect to two columns
I am required to get the list of fields that are not matching with the actual output.
file_name | page_no | field_name | value | predicted_value | actual_value
-------------------------------------------------------------------------
A 1 a 1 zx zx
A 2 b 0 xt xi
B 1 a 1 qw qw
B 2 b 0 xr xe
desired output:
b
Because b is the only field that is causing the mismatch between the two columns
The following is my code:
groups = df1.groupby(['file_name', 'page_no'])
a = pd.DataFrame(columns = ['file_name', 'page_no', 'value'])
for name, group in groups:
lst = []
if (group[group['predicted_value']] != group[group['actual_value']]):
lst = lst.append(group[group['field_name']])
print(lst)
I am required to get the list of fields that are not matching with the actual output. Here, I'm trying to store them in a list but I am getting some key error.
The error is as follows:
KeyError: "None of [Index(['A', '1234'')] are in the [columns]"
CodePudding user response:
Here is solution for test columns outside groups:
df1 = df[df['predicted_value'] != df['actual_value']]
s = df.loc[df['predicted_value'] != df['actual_value'], 'field_name']
L = s.tolist()
CodePudding user response:
Does this solve your problem ?
# Create a new dataframe retrieving only non-matching values
df1=df[df['predicted_value']!=df['actual_value']]
# Store 'field_name' column in a list format
lst=list(df1['field_name'])
print(lst)