I have below the code
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
pattern = ','.join(['$', ','])
df['A'] = df['A'].str.replace('$|,', '', regex=True)
print(df['A'])
What I am trying to remove every occurrence of '$' or ','... so I am trying to replace with blank..
But its replacing only ,
Output I am getting
0 $5756
1 3434
2 $45
3 1344$
it should be
0 5756
1 3434
2 45
3 1344
What I am doing wrong
Any help appreciated
Thanks
CodePudding user response:
If you only have integer-like numbers an easy option is to remove all but digits \D
, then you don't have to deal with other special regex characters like $
:
df['A'] = df['A'].str.replace(r'\D', '', regex=True)
output:
A
0 5756
1 3434
2 45
3 1344
CodePudding user response:
Use:
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('[$,]', '', regex=True)
print(df)
Output
A
0 5756
1 3434
2 45
3 1344
The problem is that the character $
has a special meaning in regular expressions. From the documentation (emphasis mine):
$
Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.
So you need to escape the character or put it inside a character class.
As an alternative use:
df['A'].str.replace('\$|,', '', regex=True) # note the escaping \
CodePudding user response:
it might be useful for you :
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('$', '', regex=True)
print(df['A'])