Converting Categorical Variable with % Sign to Numerical Variable Python Pandas-CodePudding

dt = {'tensile_strength': ['15%', '15%', '20%', '20%', '25%', '25%', '30%', '30%'], 
      'cotton_pct': [7, 7, 12, 17, 14, 18, 19, 25]}
mydt = pd.DataFrame(dt, columns = ['tensile_strength', 'cotton_pct'])

In my above dataset, ‘cotton_pct’ is a categorical variable. For ‘cotton_pct’, how do I create a new variable that is a numerical representation of cotton_pct?

CodePudding user response：

You can access an entire column by .str, after which you can apply .replace() to all elements of that column. Convert to 'int', and save back into the df

mydt['tensile_strength'] = mydt['tensile_strength'].str.replace("%", '').astype('int')

CodePudding user response：

You can use:

mydt['new_col'] = pd.to_numeric(mydt['tensile_strength'].str.strip('%'))

NB. using a new column here, but you can of course overwrite tensile_strength

output:

  tensile_strength  cotton_pct  new_col
0              15%           7       15
1              15%           7       15
2              20%          12       20
3              20%          17       20
4              25%          14       25
5              25%          18       25
6              30%          19       30
7              30%          25       30