How to skip NA values while iterating through column of Dataframe-CodePudding

I am trying to iterate over df[Age] column in a dataframe and trying to count the value digits if greater than 2 then df[Is_age]=='No' else 'Yes'. Is_age is new column I need to create based on age column values.

Age
23
25
<NA>
28
<NA>

I have tried below code:

Count=0
for i, j in df['Age'].iterrows():
   if j==None:
      df['Is_age']=='other'
   else:
      while(j!=None):
         for k in j:
            Count =1
         if(Count>2):
            df['Is_age']=='No'
         else:
            df['Is_age']=='Yes'

But I am getting below error:

TypeError: 'NAType' object is not iterable

Can anyone suggest solution?

CodePudding user response：

the error itself is probably because while(j!=none) is trying to iterate over j, which is not an iterable. likewise, for k in j.

I do not actually understand what you're trying to do.

Age>2 = yes, Age<=2 = no, null = other:

def ageclass(val):
    if val > 2:
        x = "Yes"
    elif val <= 2:
        x = "No"
    else:
        x = "Other"
    return x

df["Age"].apply(ageclass)

is that what you're trying to achieve?

CodePudding user response：

Your code is weird to a degree that in my eyes it doesn't make much sense to discuss why it fails (too many problems with it at the same time). So let's mention what you need to know to understand the code provided below:

You can use pd.isnull() method to test for NA values and use it in a function f() which you then apply to the Age column of the DataFrame to obtain the Is_age column. To test if an integer has more than two digits you can check if it is less 100 ( Age < 100 ) as follows:

import pandas as pd
df = pd.DataFrame({'Age': [23, 101, pd.NA, 28, pd.NA]})
print(df)
def f(row):
    if pd.isnull(row): return other    
    else: return 'Yes' if row < 100 else 'No'    
df['Is_age'] = df['Age'].apply(f)
print(df)

Here the output of the code above:

    Age
0    23
1   101
2  <NA>
3    28
4  <NA>

    Age Is_age
0    23    Yes
1   101     No
2  <NA>  other
3    28    Yes
4  <NA>  other

CodePudding user response：

I would suggest using vectorised operations when possible for reasons of readability and performance:

import pandas as pd
df = pd.DataFrame({'Age': [23, 25, pd.NA, 28, pd.NA]})
sum(df['Age'] > 2)