I want to calculate the confusion matrix myself (without using sklearn) for a dataset and to do that I used to condition
and values
as showed below:
conditions = [
(df_a['y']== 1) & (df_a['y_hat']== 1),
(df_a['y']== 1) & (df_a['y_hat']== 0),
(df_a['y']== 0) & (df_a['y_hat']== 0),
(df_a['y']== 0) & (df_a['y_hat']== 1),
]
# create a list of the values we want to assign for each condition
values = ['TP', 'FN', 'TN', 'FP']
# create a new column and use np.select to assign values to it using our lists as arguments
df_a['confusion_status'] = np.select(conditions, values)
# display updated DataFrame
result = df_a['confusion_status'].value_counts().rename_axis('unique_values').to_frame('counts') #counting values
result
The result of this operation for my dataset is:
counts
unique_values
TP 10000
FP 100
In the above result, I don't have TN and FN because the main dataset didn't have such a result
Now I want to extract the result in a form of actual number and assign it to variable "TP","TN", "FP" and "FN". If the value does exist (in this case, for example for TN and FN), I want to get 0.
I tried using iloc
to extract values like
result.loc["TP"]
but this doens't retun one number. Also, this won't cover missing values like TN, and FN for which I want to see 0
CodePudding user response:
For part 1, you need to specify the column you want:
result.loc["TP","counts"]
For part 2, you'll need to check for presence:
if 'TN' in result.index:
Or, to patch any values not present
for v in values:
if v not in result:
result.loc[v,"counts"] = 0