Home > Software engineering >  Replacing each occurrence of pattern in a dataframe
Replacing each occurrence of pattern in a dataframe

Time:06-29

I am having a "car_sales" pandas dataframe which looks as below:

     Make Colour  Odometer (KM)  Doors     Price
0  Toyota  White         150043      4   $4,000 
1   Honda    Red          87899      4   $5,000 
2  Toyota   Blue          32549      3   $7,000 
3     BMW  Black          11179      5  $22,000 
4  Nissan  White         213095      4   $3,500 
5  Toyota  Green          99213      4   $4,500 
6   Honda   Blue          45698      4   $7,500 
7   Honda   Blue          54738      4   $7,000 
8  Toyota  White          60000      4   $6,250 
9  Nissan  White          31600      4   $9,700 

I want to remove $ and , in the Price column.

For example, $4,000 should become 4000.

I have written the below code:

car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')

But, 'jupyter notebook' is throwing an error:

FutureWarning: The default value of regex will change from True to False in a future version.
  car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')

CodePudding user response:

here is one way to do it, replace all non digits to null using regex

df['Price'] = df['Price'].str.replace(r'\D', "", regex=True)
    Make    Colour  Odometer    (KM)    Doors   Price
0   0       Toyota  White     150043      4      4000
1   1       Honda   Red        87899      4      5000
2   2       Toyota  Blue       32549      3      7000
3   3       BMW     Black      11179      5     22000
4   4       Nissan  White      213095     4      3500
5   5       Toyota  Green      99213      4      4500
6   6       Honda   Blue       45698      4      7500
7   7       Honda   Blue       54738      4      7000
8   8       Toyota  White      60000      4      6250
9   9       Nissan  White      31600       4     9700

CodePudding user response:

Here is a regex that will work, and an example of how providing the regex argument explicitly will stop raising the warning:

car_sales['Price'] = car_sales['Price'].str.replace('\$|,', '', regex=True)

Output:

   Make  Colour Odometer    (KM)  Doors  Price
0     0  Toyota    White  150043      4   4000
1     1   Honda      Red   87899      4   5000
2     2  Toyota     Blue   32549      3   7000
3     3     BMW    Black   11179      5  22000
4     4  Nissan    White  213095      4   3500
5     5  Toyota    Green   99213      4   4500
6     6   Honda     Blue   45698      4   7500
7     7   Honda     Blue   54738      4   7000
8     8  Toyota    White   60000      4   6250
9     9  Nissan    White   31600      4   9700

The pattern '\$|,' says to match either a $ character or (the meaning of |) a , character.

CodePudding user response:

You can try this :

car_sales['Price'] = car_sales['Price'].str.replace("$","", regex=False)
car_sales['Price'] = car_sales['Price'].str.replace(",","", regex=False)

CodePudding user response:

you can follow this tutorial https://datatofish.com/replace-character-pandas-dataframe/

this is an experiment i have made

car_sales = {
  'Make': ['Toyota', 'Honda', 'Toyota', 'BMW', 'Nissan', 'Toyota', 'Honda', 'Honda', 'Toyota', 'Nissan'],
  'Colour': ['White', 'Honda', 'Blue', 'Black', 'White', 'Green', 'Blue', 'Blue', 'White', 'White'],
  'Odometer (KM)': ['150043', '87899', '32549', '11179', '213095', '99213', '45698', '54738', '60000', '3160'],
  'Doors': ['4', '4', '3', '5', '4', '4', '4', '4', '4', '4'],
  'Price': ['$4.000', '$5.000', '$7.000', '$22.000', '$3.500', '$4.500', '$7.500', '$7.000', '$6.250', '$9.700']
}

data_car = pd.DataFrame(car_sales, columns = ['Make', 'Colour', 'Odometer (KM)', 'Doors', 'Price'])

data_car['Price'] = data_car['Price'].str.replace('$', '') 
data_car['Price'] = data_car['Price'].str.replace('.', '') 

data_car

this is for the implementation enter image description here

Out:

Make    Colour  Odometer (KM)   Doors   Price
0   Toyota  White   150043  4   4000
1   Honda   Honda   87899   4   5000
2   Toyota  Blue    32549   3   7000
3   BMW Black   11179   5   22000
4   Nissan  White   213095  4   3500
5   Toyota  Green   99213   4   4500
6   Honda   Blue    45698   4   7500
7   Honda   Blue    54738   4   7000
8   Toyota  White   60000   4   6250
9   Nissan  White   3160    4   9700

good luck

  • Related