I'd like to replace all '0-4' to '00-04' in 'tumor-size' column in my DataFrame. What I have in the column is following.
print(df['tumor-size'].unique())
["'15-19'" "'35-39'" "'30-34'" "'25-29'" "'40-44'" "'10-14'" "'0-4'" "'20-24'" "'45-49'" "'50-54'" "'5-9'"]
What I tried at 1st place and nothing changed is following.
df['tumor-size'] = df['tumor-size'].replace('0-4', '00-04')
Next, I tried is following. In this case, all '0-4' were replaced w/ '00-40', however all '40-44' were replaced w/ '400-044' since '40-44' contains '0-4'.
df['tumor-size'] = df['tumor-size'].str.replace('0-4', '00-04')
I read other QAs and noticed me that I need regex. Then I tried following since the elements always start with '0-4', but nothing changed again.
df['tumor-size'] = df['tumor-size'].str.replace(r'^0-4', '00-04', regex=True)
What I want to do is quite simple but I have no idea how to realize this. Please someone help me. Thank you,
Note: I reload all data to df from csv file at Every single try.
CodePudding user response:
Try:
df['tumor-size'] = df['tumor-size'].replace("^'0-4'$", "'00-04'")
CodePudding user response:
You can use $:
df = pd.DataFrame(data={'tumor-size': ['15-19', '35-39', '30-34', '25-29',
'40-44', '10-14', '0-4', '20-24',
'45-49', '50-54', '5-9']})
df['tumor-size'] = df['tumor-size'].str.replace(r'^0-4$', '00-04', regex=True)
Output:
tumor-size
0 15-19
1 35-39
2 30-34
3 25-29
4 40-44
5 10-14
6 00-04
7 20-24
8 45-49
9 50-54
10 5-9