Home > front end >  Iterating and replacing dataframe column values
Iterating and replacing dataframe column values

Time:10-28

I have the following dataframe:

df = pd.DataFrame({'category': ['High', 'Central', 'Low', 'LowCentral], 
               'outcome': ['Yes', 'No', 'Yes', 'No']})

What I want to do is map the outcome column according to the category column in the following way:

  • If category == High, outcome = Yes
  • If category == Central, outcome = Maybe
  • If category == Low, outcome = No

I have tried

for i, row in df.iterrows():
    if df.loc[i, 'category'].str.contains('High'):
       df.loc[i, 'outcome'] = 'Yes'
    elif df.loc[i, 'category'].str.contains('Central'):
       df.loc[i, 'outcome'] = 'Maybe'
    elif df.loc[i, 'category'].str.contains('Low'):
       df.loc[i, 'outcome'] = 'No'

but I get the following error:

AttributeError: 'str' object has no attribute 'str'

I also tried to use the 'map' function:

df['category'] = df['outcome'].map({'High':'Yes', 'Central':'Maybe', 'Low':'No'})

But this resulted in the 4th row i.e. LowCentral to output NaN in the outcome column, which is not desired. I want to keep the outcome values that will not be included in the mapping.

Any help woud be greatly appreciated!

CodePudding user response:

Your terminology is a little mixed up. What you want is to map the category column. You were close with your map solution

df['outcome'] = df['category'].map({'High':'Yes', 'Central':'Maybe', 'Low':'No'}).fillna(df['category'])

CodePudding user response:

Take look at pandas.Series.replace, consider following example

import pandas as pd
df = pd.DataFrame({'category': ['High', 'Central', 'Low', 'LowCentral'],'outcome': ['Yes', 'No', 'Yes', 'No']})
df['outcome'] = df['category'].replace({'High':'Yes','Central':'Maybe','Low':'No'})
print(df)

output

     category     outcome
0        High         Yes
1     Central       Maybe
2         Low          No
3  LowCentral  LowCentral

Note that unknowns are left without change

CodePudding user response:

Try this one.

import pandas as pd

df = pd.DataFrame({'category': ['High', 'Central', 'Low', 'LowCentral'], 
               'outcome': ['Yes', 'No', 'Yes', 'No']})
               
for i, row in df.iterrows():
    if   'High'    in df.loc[i, 'category']:
       df.loc[i, 'outcome'] = 'Yes'
    elif 'Low'     in df.loc[i, 'category']:
       df.loc[i, 'outcome'] = 'No'
    elif 'Central' in df.loc[i, 'category']:
       df.loc[i, 'outcome'] = 'Maybe'   
print(df)

[Output]

     category outcome
0        High     Yes
1     Central   Maybe
2         Low      No
3  LowCentral      No
  • Related