I know this is a very basic question but I just can't do it.
My data has 20k rows, 10 rows have '$' and ','.
1797 5500
1798 255
1799 3600
1800 $0
1801 $2,620
1802 $6,000
1803 $2,200
1804 $390
1805 $2,688
1806 $13,069
1807 $7,000
1808 $5,000
1809 $2,089
1810 936
1811 5468
1812 20000
I tried the following code but its taking way too long. And is giving me SettingWithCopyWarning warning.
i = 0
df['R3_v2'] = ''
while i < len(df['ID']) 1:
if "$" in df['R3'][i]:
df['R3_v2'][i] = df['R3'][i].replace('$', '')
if "," in df['R3'][i]:
df['R3_v2'][i] = df['R3'][i].replace(',', '')
if "$" or "," not in df['R3'][i]:
df['R3_v2'][i] = df['R3'][i]
i = i 1
if i == len(df['ID']):
break
If I remove the last "if" statement (ie, not in), then the code will run fast. But my R3_v2 for those rows would be empty unless I replace it directly.
Can someone tell me what's wrong? Thanks
CodePudding user response:
Use regex with character class:
df['R3_v2'] = df['R3'].str.replace('[$,]', '', regex=True)
Output:
idx R3 R3_v2
0 1797 5500 5500
1 1798 255 255
2 1799 3600 3600
3 1800 $0 0
4 1801 $2,620 2620
5 1802 $6,000 6000
6 1803 $2,200 2200
7 1804 $390 390
8 1805 $2,688 2688
9 1806 $13,069 13069
10 1807 $7,000 7000
11 1808 $5,000 5000
12 1809 $2,089 2089
13 1810 936 936
14 1811 5468 5468
15 1812 20000 20000
CodePudding user response:
Try this:
df['R3'] = df['R3'].str.replace('$', '')
df['R3'] = df['R3'].str.replace(',', '')
Also covered here.