Home > Software engineering >  how to assign 0 to index which doesn't exist in result data frame in Panda
how to assign 0 to index which doesn't exist in result data frame in Panda

Time:03-07

I want to calculate the confusion matrix myself (without using sklearn) for a dataset and to do that I used to condition and values as showed below:

conditions = [
    (df_a['y']== 1) & (df_a['y_hat']== 1),
    (df_a['y']== 1) & (df_a['y_hat']== 0),
    (df_a['y']== 0) & (df_a['y_hat']== 0),
    (df_a['y']== 0) & (df_a['y_hat']== 1),
    ]

# create a list of the values we want to assign for each condition
values = ['TP', 'FN', 'TN', 'FP']

# create a new column and use np.select to assign values to it using our lists as arguments
df_a['confusion_status'] = np.select(conditions, values)

# display updated DataFrame
result = df_a['confusion_status'].value_counts().rename_axis('unique_values').to_frame('counts') #counting values
result

The result of this operation for my dataset is:

                 counts
unique_values   
TP               10000
FP               100

In the above result, I don't have TN and FN because the main dataset didn't have such a result

Now I want to extract the result in a form of actual number and assign it to variable "TP","TN", "FP" and "FN". If the value does exist (in this case, for example for TN and FN), I want to get 0.

I tried using iloc to extract values like

result.loc["TP"]

but this doens't retun one number. Also, this won't cover missing values like TN, and FN for which I want to see 0

CodePudding user response:

For part 1, you need to specify the column you want:

result.loc["TP","counts"]

For part 2, you'll need to check for presence:

if 'TN' in result.index:

Or, to patch any values not present

for v in values:
    if v not in result:
        result.loc[v,"counts"] = 0
  • Related