I have the following dataframe:
df = pd.DataFrame({'category': ['High', 'Central', 'Low', 'LowCentral],
'outcome': ['Yes', 'No', 'Yes', 'No']})
What I want to do is map the outcome column according to the category column in the following way:
- If category == High, outcome = Yes
- If category == Central, outcome = Maybe
- If category == Low, outcome = No
I have tried
for i, row in df.iterrows():
if df.loc[i, 'category'].str.contains('High'):
df.loc[i, 'outcome'] = 'Yes'
elif df.loc[i, 'category'].str.contains('Central'):
df.loc[i, 'outcome'] = 'Maybe'
elif df.loc[i, 'category'].str.contains('Low'):
df.loc[i, 'outcome'] = 'No'
but I get the following error:
AttributeError: 'str' object has no attribute 'str'
I also tried to use the 'map' function:
df['category'] = df['outcome'].map({'High':'Yes', 'Central':'Maybe', 'Low':'No'})
But this resulted in the 4th row i.e. LowCentral to output NaN in the outcome column, which is not desired. I want to keep the outcome values that will not be included in the mapping.
Any help woud be greatly appreciated!
CodePudding user response:
Your terminology is a little mixed up. What you want is to map the category
column. You were close with your map solution
df['outcome'] = df['category'].map({'High':'Yes', 'Central':'Maybe', 'Low':'No'}).fillna(df['category'])
CodePudding user response:
Take look at pandas.Series.replace
, consider following example
import pandas as pd
df = pd.DataFrame({'category': ['High', 'Central', 'Low', 'LowCentral'],'outcome': ['Yes', 'No', 'Yes', 'No']})
df['outcome'] = df['category'].replace({'High':'Yes','Central':'Maybe','Low':'No'})
print(df)
output
category outcome
0 High Yes
1 Central Maybe
2 Low No
3 LowCentral LowCentral
Note that unknowns are left without change
CodePudding user response:
Try this one.
import pandas as pd
df = pd.DataFrame({'category': ['High', 'Central', 'Low', 'LowCentral'],
'outcome': ['Yes', 'No', 'Yes', 'No']})
for i, row in df.iterrows():
if 'High' in df.loc[i, 'category']:
df.loc[i, 'outcome'] = 'Yes'
elif 'Low' in df.loc[i, 'category']:
df.loc[i, 'outcome'] = 'No'
elif 'Central' in df.loc[i, 'category']:
df.loc[i, 'outcome'] = 'Maybe'
print(df)
[Output]
category outcome
0 High Yes
1 Central Maybe
2 Low No
3 LowCentral No