How replace a String with another String in a Data Frame?-CodePudding

I've the following Data Frame.

data = pd.read_csv("Example.csv")
data["Column1"]

     Column0   Column1

0      a       Gold 
1      b       Silver  
2      b       Silver (Running)
3      c       Bronze (800m)
4      c       Bronze 
5      a       2x Gold (500m)
6      a       Really Successful, 2x WM Gold (500m)

My Goal is to replace some of the Strings with only the Medals.

data = pd.read_csv("Example.csv")
data["Column1"]

     Column0     Column1

0      a         Gold
1      b         Silver
2      b         Silver
3      c         Bronze
4      c         Bronze
5      a         Gold
6      a         Gold
7      a         Gold
8      a         Gold

I've already tried the replace() method. But it doesnt work. Like this :

data[Column1] = data.replace({"Column1": "Silver"}, "Silver)

CodePudding user response：

You can try str.extract

df['Column1'] = df['Column1'].str.extract('(Gold|Silver|Bronze)')

print(df)

  Column0 Column1
0       a    Gold
1       b  Silver
2       b  Silver
3       c  Bronze
4       c  Bronze
5       a    Gold
6       a    Gold

To ignore case, you can use flags argument

import re

df['Column1'] = df['Column1'].str.extract('(gold|silver|bronze)', flags=re.IGNORECASE)

CodePudding user response：

Try using:

data[Column1] = data.replace({'Silver (Running)':'Silver'})
data[Column1]

CodePudding user response：

You need to define clearly the problem that you want to solve Your problem here is not a use case for replace, what you want to do is to keep only the medal in the column "Column1", and not to replace the whole string. You might solve this problem as follows Creation of the data frame

df = pd.DataFrame({"Column0": ["a","b","b","c","c","a","a",], "Column1":[
    "Gold ",
    "Silver  ",
    "Silver (Running)",
    "Bronze (800m)",
    "Bronze ",
    "2x Gold (500m)",
    "Really Successful, 2x WM Gold (500m)",
]})

You can use apply on the column Column1 using the following function

def replace_string_by_medal(string):
    for medal in ["Gold","Silver","Bronze"]:
        if medal in string:
            return medal

df.Column1.apply(replace_string_by_medal)

This will return a column that has what you want and you can replace the column Column1 with the new value

df.loc["Column1"] = df.Column1.apply(replace_string_by_medal)

df

    Column0 Column1
0   a       Gold
1   b       Silver
2   b       Silver
3   c       Bronze
4   c       Bronze
5   a       Gold
6   a       Gold

CodePudding user response：

As you have a defined list of possibilities, the easiest is to use str.extract:

df['Column1'] = df['Column1'].str.extract('(Gold|Silver|Bronze)')

output:

  Column0 Column1
0       a    Gold
1       b  Silver
2       b  Silver
3       c  Bronze
4       c  Bronze
5       a    Gold
6       a    Gold