Looping through values in a specific column and changing values Python and Pandas-CodePudding

I have a dataframe as follows (example is simplified):

id       prediction1          prediction2
1234     Cocker_spaniel       german_Shepard
5678     rhodesian_ridgeback  australian_shepard

I need to remove the underscores and make sure the string is in lower case so I can search it easier later.

I am not quite sure how to loop through this. My initial student thought is something like what follows:

for row in image_predictions['p1']:
    image_predictions['p1'] = image_predictions['p1'].replace('_', ' ')

The above code is for replacing the underscore with a space and I believe the code would be similar for lowercase using the .lower() method.

Any advice to point me in the right direction?

CodePudding user response：

For in place modification you can use:

df.update(df[['prediction1', 'prediction2']]
          .apply(lambda c: c.str.lower()
                            .str.replace('_', ' ', regex=False))
          )

Output:

     id          prediction1         prediction2
0  1234       cocker spaniel      german shepard
1  5678  rhodesian ridgeback  australian shepard

CodePudding user response：

You can use image_predictions['p1'].apply() to apply a function to each cell of the p1 column:

def myFunction(x):
    return x.replace('_', ' ')
image_predictions['p1'] = image_predictions['p1'].apply(myFunction)

CodePudding user response：

Wanted to see if it was possible to not have to specify the columns for replacement. This approach creates a dict to replace A -> a, B -> b, etc, and _ -> space. Then uses replace with regex=True

import string

replace_dict = dict(zip(string.ascii_uppercase,string.ascii_lowercase))
replace_dict['_'] = ' '

df.replace(replace_dict, regex=True, inplace=True)

print(df)