Home > front end >  Pandas .str.replace not replacing all instances in a table
Pandas .str.replace not replacing all instances in a table

Time:12-06

I've got a data frame with several instances of ' *Winner*' after the person's name. I'd like to remove this so just their name remains. A simplified version of this is below:

Data = {'YEAR': [2020, 2020, 2020],
        'NAME': [ 'Tom *Winner*', 'Dick *Winner*', 'Harry *Winner*']}
df = pd.DataFrame(Data)
df['NAME'] = df['NAME'].str.replace(' [*]Winner[*]', '')
df

Which results in this:

    YEAR    NAME
0   2020    Tom
1   2020    Dick *Winner*
2   2020    Harry

So my question is what is different about Dick's ' *Winner*' tag vs. the other two? Why is this one not being removed? Is there a better way to accomplish this?

As an extra confusing note, I just copied my code above and re-ran it (assuming that's how people here might test it) and it does remove Dick's ' *Winner*' tag. So something about my original data is different and is stripped out when pasted here. Not sure how to help reproduce it this being the case.

Edit: Per request from @user17242583, here is the result of df.to_dict

{'NAME': {0: 'Tom *Winner*', 1: 'Dick\xa0*Winner*', 2: 'Harry *Winner*'},
 'YEAR': {0: 2020, 1: 2020, 2: 2020}}

CodePudding user response:

Try this:

df['NAME'] = df['NAME'].str.replace('\s*\*Winner\*', '', regex=True)

Output:

>>> df
   YEAR   NAME
0  2020    Tom
1  2020   Dick
2  2020  Harry

The reason that works is because your data has a \xa0, which is a thin, non-breaking space. In your original code, you have a space hardcoded, but my code uses \s* which means "0 or more of any whitespace characters," which includes \xa0 (char code 160).

  • Related