Home > Blockchain >  Python Pandas filtering rows based on column value returns NaN
Python Pandas filtering rows based on column value returns NaN

Time:10-26

I have the following dictionary with some hyperparameter settings and corresponding results:

data_ dict = {('Splits',): {0: 0, 1: 0, 2: 0, 3: 0},
 ('Weights',): {0: 'uniform', 1: 'uniform', 2: 'distance', 3: 'distance'},
 ('K_neighbors',): {0: 1, 1: 2, 2: 1, 3: 2},
 ('Accuracy',): {0: 0.69, 1: 0.721, 2: 0.69, 3: 0.713},
 ('AUROC',): {0: 0.558, 1: 0.524, 2: 0.558, 3: 0.532},
 ('Prec_0',): {0: 0.77, 1: 0.753, 2: 0.77, 3: 0.756},
 ('Prec_1',): {0: 0.368, 1: 0.369, 2: 0.368, 3: 0.366},
 ('Rec_0',): {0: 0.831, 1: 0.929, 2: 0.831, 3: 0.904},
 ('Rec_1',): {0: 0.285, 1: 0.119, 2: 0.285, 3: 0.159},
 ('f1_0',): {0: 0.799, 1: 0.832, 2: 0.799, 3: 0.824},
 ('f1_1',): {0: 0.321, 1: 0.18, 2: 0.321, 3: 0.222}}

Then I cast this list to a pandas DataFrame:

results = pd.DataFrame(dicta)

Which returns the following


Splits  Weights K_neighbors Accuracy    AUROC   Prec_0  Prec_1  Rec_0   Rec_1   f1_0    f1_1
0   0   uniform     1       0.690       0.558   0.770   0.368   0.831   0.285   0.799   0.321
1   0   uniform     2       0.721       0.524   0.753   0.369   0.929   0.119   0.832   0.180
2   0   distance    1       0.690       0.558   0.770   0.368   0.831   0.285   0.799   0.321
3   0   distance    2       0.713       0.532   0.756   0.366   0.904   0.159   0.824   0.222

Now I try to filter the rows which contain hyperparameter Weight only equal to 'uniform':

results[(results['Weights']=='uniform')]

However, the returned DataFrame has all the values, except to the ones that we are filtering for equal to Nan:

    Splits  Weights K_neighbors Accuracy    AUROC   Prec_0  Prec_1  Rec_0   Rec_1   f1_0    f1_1
0   NaN     uniform NaN         NaN         NaN     NaN     NaN     NaN     NaN     NaN     NaN
1   NaN     uniform NaN         NaN         NaN     NaN     NaN     NaN     NaN     NaN     NaN
2   NaN     NaN     NaN         NaN         NaN     NaN     NaN     NaN     NaN     NaN     NaN
3   NaN     NaN     NaN         NaN         NaN     NaN     NaN     NaN     NaN     NaN     NaN

However, the desired output of the code is:


Splits  Weights K_neighbors Accuracy    AUROC   Prec_0  Prec_1  Rec_0   Rec_1   f1_0    f1_1
0   0   uniform     1       0.690       0.558   0.770   0.368   0.831   0.285   0.799   0.321
1   0   uniform     2       0.721       0.524   0.753   0.369   0.929   0.119   0.832   0.180

CodePudding user response:

Identified the problem, your columns are multi Index and thats the reason behind this issue. Kindly rename the columns as follows:

results.columns = [col[0] for col in results.columns]

results[results['Weights']=='uniform']

and then try it , it will work .

  • Related