I am trying to iterate over df[Age] column in a dataframe and trying to count the value digits if greater than 2 then df[Is_age]=='No' else 'Yes'. Is_age is new column I need to create based on age column values.
Age
23
25
<NA>
28
<NA>
I have tried below code:
Count=0
for i, j in df['Age'].iterrows():
if j==None:
df['Is_age']=='other'
else:
while(j!=None):
for k in j:
Count =1
if(Count>2):
df['Is_age']=='No'
else:
df['Is_age']=='Yes'
But I am getting below error:
TypeError: 'NAType' object is not iterable
Can anyone suggest solution?
CodePudding user response:
the error itself is probably because while(j!=none)
is trying to iterate over j, which is not an iterable. likewise, for k in j
.
I do not actually understand what you're trying to do.
Age>2 = yes, Age<=2 = no, null = other:
def ageclass(val):
if val > 2:
x = "Yes"
elif val <= 2:
x = "No"
else:
x = "Other"
return x
df["Age"].apply(ageclass)
is that what you're trying to achieve?
CodePudding user response:
Your code is weird to a degree that in my eyes it doesn't make much sense to discuss why it fails (too many problems with it at the same time). So let's mention what you need to know to understand the code provided below:
You can use pd.isnull()
method to test for NA
values and use it in a function f()
which you then apply to the Age
column of the DataFrame to obtain the Is_age
column. To test if an integer has more than two digits you can check if it is less 100 ( Age < 100 ) as follows:
import pandas as pd
df = pd.DataFrame({'Age': [23, 101, pd.NA, 28, pd.NA]})
print(df)
def f(row):
if pd.isnull(row): return other
else: return 'Yes' if row < 100 else 'No'
df['Is_age'] = df['Age'].apply(f)
print(df)
Here the output of the code above:
Age
0 23
1 101
2 <NA>
3 28
4 <NA>
Age Is_age
0 23 Yes
1 101 No
2 <NA> other
3 28 Yes
4 <NA> other
CodePudding user response:
I would suggest using vectorised operations when possible for reasons of readability and performance:
import pandas as pd
df = pd.DataFrame({'Age': [23, 25, pd.NA, 28, pd.NA]})
sum(df['Age'] > 2)