I am having a "car_sales"
pandas dataframe
which looks as below:
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 $4,000
1 Honda Red 87899 4 $5,000
2 Toyota Blue 32549 3 $7,000
3 BMW Black 11179 5 $22,000
4 Nissan White 213095 4 $3,500
5 Toyota Green 99213 4 $4,500
6 Honda Blue 45698 4 $7,500
7 Honda Blue 54738 4 $7,000
8 Toyota White 60000 4 $6,250
9 Nissan White 31600 4 $9,700
I want to remove $
and ,
in the Price
column.
For example, $4,000
should become 4000
.
I have written the below code:
car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')
But, 'jupyter notebook' is throwing an error:
FutureWarning: The default value of regex will change from True to False in a future version.
car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')
CodePudding user response:
here is one way to do it, replace all non digits to null using regex
df['Price'] = df['Price'].str.replace(r'\D', "", regex=True)
Make Colour Odometer (KM) Doors Price
0 0 Toyota White 150043 4 4000
1 1 Honda Red 87899 4 5000
2 2 Toyota Blue 32549 3 7000
3 3 BMW Black 11179 5 22000
4 4 Nissan White 213095 4 3500
5 5 Toyota Green 99213 4 4500
6 6 Honda Blue 45698 4 7500
7 7 Honda Blue 54738 4 7000
8 8 Toyota White 60000 4 6250
9 9 Nissan White 31600 4 9700
CodePudding user response:
Here is a regex that will work, and an example of how providing the regex
argument explicitly will stop raising the warning:
car_sales['Price'] = car_sales['Price'].str.replace('\$|,', '', regex=True)
Output:
Make Colour Odometer (KM) Doors Price
0 0 Toyota White 150043 4 4000
1 1 Honda Red 87899 4 5000
2 2 Toyota Blue 32549 3 7000
3 3 BMW Black 11179 5 22000
4 4 Nissan White 213095 4 3500
5 5 Toyota Green 99213 4 4500
6 6 Honda Blue 45698 4 7500
7 7 Honda Blue 54738 4 7000
8 8 Toyota White 60000 4 6250
9 9 Nissan White 31600 4 9700
The pattern '\$|,'
says to match either a $
character or (the meaning of |
) a ,
character.
CodePudding user response:
You can try this :
car_sales['Price'] = car_sales['Price'].str.replace("$","", regex=False)
car_sales['Price'] = car_sales['Price'].str.replace(",","", regex=False)
CodePudding user response:
you can follow this tutorial https://datatofish.com/replace-character-pandas-dataframe/
this is an experiment i have made
car_sales = {
'Make': ['Toyota', 'Honda', 'Toyota', 'BMW', 'Nissan', 'Toyota', 'Honda', 'Honda', 'Toyota', 'Nissan'],
'Colour': ['White', 'Honda', 'Blue', 'Black', 'White', 'Green', 'Blue', 'Blue', 'White', 'White'],
'Odometer (KM)': ['150043', '87899', '32549', '11179', '213095', '99213', '45698', '54738', '60000', '3160'],
'Doors': ['4', '4', '3', '5', '4', '4', '4', '4', '4', '4'],
'Price': ['$4.000', '$5.000', '$7.000', '$22.000', '$3.500', '$4.500', '$7.500', '$7.000', '$6.250', '$9.700']
}
data_car = pd.DataFrame(car_sales, columns = ['Make', 'Colour', 'Odometer (KM)', 'Doors', 'Price'])
data_car['Price'] = data_car['Price'].str.replace('$', '')
data_car['Price'] = data_car['Price'].str.replace('.', '')
data_car
this is for the implementation enter image description here
Out:
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 4000
1 Honda Honda 87899 4 5000
2 Toyota Blue 32549 3 7000
3 BMW Black 11179 5 22000
4 Nissan White 213095 4 3500
5 Toyota Green 99213 4 4500
6 Honda Blue 45698 4 7500
7 Honda Blue 54738 4 7000
8 Toyota White 60000 4 6250
9 Nissan White 3160 4 9700
good luck