Confianza |
---|
2.0 |
4.0 |
7.0 |
NaN |
Expected Output:
Confianza |
---|
Baja |
Media |
Alta |
NaN |
In a pandas DataFrame, I want to apply this function for a column but skip NaN
def condiciones(df5):
if ( df5['Confianza'] > 4 ):
return "Alta"
elif (df5['Confianza'] == 4 ):
return "Media"
else:
return "Baja"
df5['Confianza']= df5.apply(condiciones, axis= 1)
The actual problem is I dont want to drop NaN rows, I tried this but returns error when apply "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
elif ( df5['Confianza'] != notnull):
return NaN
CodePudding user response:
import numpy as np
import pandas as pd
df = pd.DataFrame({"Confianza": [2.0, 4.0, 7.0, None]})
def condiciones(row):
if (row["Confianza"] is None) or (np.isnan(row["Confianza"])):
return np.nan
elif (row["Confianza"] > 4):
return "Alta"
elif (row["Confianza"] == 4):
return "Media"
else:
return "Baja"
df['Confianza']= df.apply(condiciones, axis= 1)
print(df)
CodePudding user response:
First of all, when 'axis' is set to 1 then the function applies to each row
1 or ‘columns’: apply function to each row. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html?highlight=apply#pandas.DataFrame.apply
So, your mistake is getting the whole series referencing to the dataframe 'df5' in your function.
You can get the same result without utilizing 'apply' actually, but using 'loc'. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html?highlight=loc#pandas.DataFrame.loc
df5.loc[df5["Confianza"] > 4, 'Confianza2']= "Alta"
df5.loc[df5["Confianza"] == 4, 'Confianza2']= "Media"
df5.loc[df5["Confianza"].notna() & (df5["Confianza"] < 4), 'Confianza2']= "Baja"
CodePudding user response:
here is one way to do it using np.selecgt
import numpy as np
df['Confianza']=np.select( [(df['Confianza'].notna() & (df['Confianza']> 4.0)),
(df['Confianza'].notna() & (df['Confianza']== 4.0))],
['Alta', 'Media'],
df['Confianza'])
df
Confianza
0 Baja
1 Media
2 Alta
3 Baja