I've the following Data Frame.
data = pd.read_csv("Example.csv")
data["Column1"]
Column0 Column1
0 a Gold
1 b Silver
2 b Silver (Running)
3 c Bronze (800m)
4 c Bronze
5 a 2x Gold (500m)
6 a Really Successful, 2x WM Gold (500m)
My Goal is to replace some of the Strings with only the Medals.
data = pd.read_csv("Example.csv")
data["Column1"]
Column0 Column1
0 a Gold
1 b Silver
2 b Silver
3 c Bronze
4 c Bronze
5 a Gold
6 a Gold
7 a Gold
8 a Gold
I've already tried the replace()
method. But it doesnt work.
Like this :
data[Column1] = data.replace({"Column1": "Silver"}, "Silver)
CodePudding user response:
You can try str.extract
df['Column1'] = df['Column1'].str.extract('(Gold|Silver|Bronze)')
print(df)
Column0 Column1
0 a Gold
1 b Silver
2 b Silver
3 c Bronze
4 c Bronze
5 a Gold
6 a Gold
To ignore case, you can use flags
argument
import re
df['Column1'] = df['Column1'].str.extract('(gold|silver|bronze)', flags=re.IGNORECASE)
CodePudding user response:
Try using:
data[Column1] = data.replace({'Silver (Running)':'Silver'})
data[Column1]
CodePudding user response:
You need to define clearly the problem that you want to solve
Your problem here is not a use case for replace
, what you want to do is to keep only the medal in the column "Column1", and not to replace the whole string.
You might solve this problem as follows
Creation of the data frame
df = pd.DataFrame({"Column0": ["a","b","b","c","c","a","a",], "Column1":[
"Gold ",
"Silver ",
"Silver (Running)",
"Bronze (800m)",
"Bronze ",
"2x Gold (500m)",
"Really Successful, 2x WM Gold (500m)",
]})
You can use apply on the column Column1
using the following function
def replace_string_by_medal(string):
for medal in ["Gold","Silver","Bronze"]:
if medal in string:
return medal
df.Column1.apply(replace_string_by_medal)
This will return a column that has what you want and you can replace the column Column1
with the new value
df.loc["Column1"] = df.Column1.apply(replace_string_by_medal)
df
Column0 Column1
0 a Gold
1 b Silver
2 b Silver
3 c Bronze
4 c Bronze
5 a Gold
6 a Gold
CodePudding user response:
As you have a defined list of possibilities, the easiest is to use str.extract
:
df['Column1'] = df['Column1'].str.extract('(Gold|Silver|Bronze)')
output:
Column0 Column1
0 a Gold
1 b Silver
2 b Silver
3 c Bronze
4 c Bronze
5 a Gold
6 a Gold