Home > Back-end >  How to modify loop so as to take NaN values from values in columns in DataFrame in Pandas Python?
How to modify loop so as to take NaN values from values in columns in DataFrame in Pandas Python?

Time:08-03

I have sample of my code in Python like below:

...

for col in df.columns.tolist():
    if val in df[f"{col}"].values:
       if val.isna():
          my_list.append(col)

So, if some column from my DataFrame contains NaN value add name of this column to "my_list".

I know that in my DF are columns with NaN values, but my code generate empty "my_list", probably the error is in line: if val.isna():, how can I modify that? How can I "tell" Python to take NaN values from columns ?

CodePudding user response:

Just use a if col statement like this

for col in df.columns.tolist():
    if val in df[f"{col}"].values:
       if col == False:
          my_list.append(col)

I am not giving you the best way of doing it, just fixing your little list loop

CodePudding user response:

By iterating over the values in the column, adding the column name to my_list and then breaking you get this:

my_list = ['col1','col3']

My code:

import pandas as pd
from numpy import NaN

df = pd.DataFrame(data={
    "col1":[10,2.5,NaN],
    "col2":[10,2.5,3.5],
    "col3":[5,NaN,1]})
my_list = []

for col in df.columns:
    for val in df[col].values:
        if pd.isna(val):
            my_list.append(col)
            break
print(f"{my_list=}")

CodePudding user response:

You can fix your code with changes that @Orange mentioned. I'm just adding this as an alternative. When working with data you want to allow the data base/data analysis software to do the heavy lifting. Looping over a cursor is something you should try to avoid as best as you can.

The code you have can be changed to:

for col in df.columns:
    if df[col].hasnans:
        my_list.append(col)

The code below functionally does the same thing:

df.columns[[df[col].hasnans for col in df.columns]].to_list()

The code below calculates hasnans using isna and sum.

df.columns[df.isna().sum() > 0].to_list()
  • Related