Pandas: AttributeError: 'float' object has no attribute 'isnull'-CodePudding

Input df

ID      Date    TAVG  TMAX  TMIN
1   01-01-2020         26    21
2   01-01-2020   15    16    
3   01-01-2020   25    29    18
1   02-01-2020   16          16
2   02-01-2020         26    20
.....

The code I am using

for index, row in df.iterrows():

    if [(row["TMIN"].isnull()) & (row["TAVG"].notnull()) & (row["TMAX"].notnull())]:
        row["TMIN"] = (2 * row["TAVG"]) - row["TMAX"]

    if [(row["TMAX"].isnull()) & (row["TMIN"].notnull()) & (row["TAVG"].notnull())]:
        row["TMAX"] = (2 * row["TAVG"]) - row["TMIN"]

    if [(row["TAVG"].isnull()) & (row["TMIN"].notnull()) & (row["TMAX"].notnull())]:
        row["TAVG"] = (row["TMIN"]   row["TMAX"]) / 2

When I run this, I get the below error:

    if [(row["TMIN"].isnull()) & (row["TAVG"].notnull()) & (row["TMAX"].notnull())]:                                                                                                                                                                    
AttributeError: 'float' object has no attribute 'isnull'

How to fix this? Any alternate way to achieve the same result?

CodePudding user response：

.isnull() and .notnull() work on series/columns (or even dataframes. You're accessing an element of a row, that is, a single element (which happens to be a float). That causes the error.

For a lot of cases in Pandas, you shouldn't iterate over the rows individually: work column-wise instead, and skip the loop.

Your particular issue could be translated to be, column-wise:

sel = df['TMIN'].isnull() & df['TAVG'].notnull() & df['TMAX'].notnull()
df.loc[sel, 'TMIN'] = df.loc[sel, 'TAVG'] * 2 - df.loc[sel, 'TMAX']

and similar for the other two columns. All without any iterrows() or other loop.

However, since you are apparently trying to replace NaNs/null values with values from other columns, you can use .fillna() here:

df['TMIN'].fillna(df['TAVG'] * 2 - df['TMAX'], inplace=True)

or if you don't like inplace (because you don't want to change the original dataframe, or want to use the result directly in a chain computation):

df['tmin2'] = df['TMIN'].fillna(df['TAVG'] * 2 - df['TMAX'])

and for the other two columns:

df['tmax2'] = 2 * df['TAVG'] - df['TMIN']
df['tavg2'] = (df['TAVG']   df['TMIN'])/2

You may ask what happens in a TMIN cell is null, and either the TAVG or TMAX value, or both, is null. In that case, you'd be replacing the null value with null, so nothing happens. Which, given your original if statement, would also be the case in your original code.