In Pandas df, apply function skiping NaN-CodePudding

Confianza
2.0
4.0
7.0
NaN

Expected Output:

Confianza
Baja
Media
Alta
NaN

In a pandas DataFrame, I want to apply this function for a column but skip NaN

def condiciones(df5):
    if ( df5['Confianza'] > 4 ):
        return "Alta"
    elif (df5['Confianza'] == 4 ):
        return "Media"
    else:
        return "Baja"


df5['Confianza']= df5.apply(condiciones, axis= 1)

The actual problem is I dont want to drop NaN rows, I tried this but returns error when apply "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

elif ( df5['Confianza'] != notnull):
      return NaN

CodePudding user response：

import numpy as np
import pandas as pd
df = pd.DataFrame({"Confianza": [2.0, 4.0, 7.0, None]})

def condiciones(row):
    if (row["Confianza"] is None) or (np.isnan(row["Confianza"])):
        return np.nan
    elif (row["Confianza"] > 4):
        return "Alta"
    elif (row["Confianza"] == 4):
        return "Media"
    else:
        return "Baja"


df['Confianza']= df.apply(condiciones, axis= 1)
print(df)

CodePudding user response：

First of all, when 'axis' is set to 1 then the function applies to each row

1 or ‘columns’: apply function to each row. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html?highlight=apply#pandas.DataFrame.apply

So, your mistake is getting the whole series referencing to the dataframe 'df5' in your function.

You can get the same result without utilizing 'apply' actually, but using 'loc'. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html?highlight=loc#pandas.DataFrame.loc

df5.loc[df5["Confianza"] > 4, 'Confianza2']= "Alta"
df5.loc[df5["Confianza"] == 4, 'Confianza2']= "Media"
df5.loc[df5["Confianza"].notna() & (df5["Confianza"] < 4), 'Confianza2']= "Baja"

CodePudding user response：

here is one way to do it using np.selecgt

import numpy as np

df['Confianza']=np.select( [(df['Confianza'].notna() & (df['Confianza']> 4.0)),
                            (df['Confianza'].notna() & (df['Confianza']== 4.0))],
          ['Alta', 'Media'],
          df['Confianza'])
df

Confianza
0   Baja
1   Media
2   Alta
3   Baja