Home > Enterprise >  String formatting of data in column of pandas dataframe
String formatting of data in column of pandas dataframe

Time:09-23

How can I remove '$' in a column values?
Example: I have a column with values like $40, $23, $35,
I want to see those column values like 40, 23, 35.

I have already tried:

DF_['H1 % Change'] = DF_['H1 % Change'].replace(r'$', '')

This didn't work!

CodePudding user response:

Series.replace replaces all values in the column that exactly match the first argument. You want Series.str.replace to replace a substring:

DF_['H1 % Change'] = DF_['H1 % Change'].str.replace(r'$', '')

CodePudding user response:

You can iterate over all elements in you vector:

DF_['H1 % Change'] = [x.replace(r'$', '') for x in DF_['H1 % Change'] ]

This will replace elements one-by-one

CodePudding user response:

You can still do it with .replace() with supplying the parameter regex=, as follows:

DF_['H1 % Change'] = DF_['H1 % Change'].replace(r'\$', '', regex=True)

Demo

Data Input

DF_ = pd.DataFrame({'H1 % Change': ['$40', '$23', '$35']})

print(DF_)

  H1 % Change
0         $40
1         $23
2         $35

Output

DF_['H1 % Change'] = DF_['H1 % Change'].replace(r'\$', '', regex=True)

print(DF_)

  H1 % Change
0          40
1          23
2          35

Here, with specifying regex=True, the first parameter is regarded as a regex. Hence, we need to quote the regex as \$ (with a backslash) in order to "escape" the $ symbol which, by coincident, is also a regex metacharacter (for end-of string anchor).

Detailed Explanation

Some more background information below for your reference:

Your code with .replace() doesn't work because by default, regex=False for .replace() and with this setting, you can see from the official document (extracted below):

Parameters to_replace str, regex, list, dict, Series, int, float, or None

str: string exactly matching to_replace will be replaced with value

regex: regexs matching to_replace will be replaced with value

that when we supply the first parameter to_replace with a string (with default regex=False), it requires exact matching (full-match) string in order to replace. That is, if you specify $, then no other characters are allowed in the testing string, in order to get a match.

When it is a regex (with regex=True), testing string matching the regex (can be partial match) will be regarded as a match.

Alternate Solution

Another way to remove the dollar sign $ is by using .str.replace(), which has by default regex=True. Hence, we can do it simply by:

DF_['H1 % Change'] = DF_['H1 % Change'].str.replace(r'\$', '')     

or simply:

DF_['H1 % Change'] = DF_['H1 % Change'].str.replace(r'$', '')     

Here, it turns out that we can use both \$ (with backslash) and $ (without backslash) to achieve the same result.

For people using newer version of Pandas, you may get a warning message if you don't specify the parameter regex=:

FutureWarning: The default value of regex will change from True to False in a future version.

Hence, it is recommended to quote also the parameter regex= as well, like below:

DF_['H1 % Change'] = DF_['H1 % Change'].str.replace(r'\$', '', regex=True)     
  • Related