I am doing some data visualization with matplotlib. I import a .csv file looking like this:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Month#       12 non-null     int64 
 1   Face_Cream   12 non-null     int64 
 2   Face_Wash    12 non-null     int64 
 3   Toothpaste   12 non-null     int64 
 4   Bath_Soap    12 non-null     int64 
 5   Shampoo      12 non-null     int64 
 6   Moisturizer  12 non-null     int64 
 7   Total_Units  12 non-null     int64 
 8   Profit       12 non-null     object
dtypes: int64(8), object(1)
memory usage: 992.0  bytes

No matter what I do, I cannot convert the 'Profit' column to float. It previously had '$', and whitespace in the column's elements, but I have removed them all with:

df.Profit # before
0     $181,660.60 
1     $177,954.70 
2     $169,498.45 
3     $166,075.80 
4     $173,176.85 
5     $201,538.70 
6     $190,267.00 
7     $151,039.35 
8     $197,819.60 
9     $161,810.55 
10    $187,298.65 
11    $196,434.70 
Name: Profit, dtype: object

df.Profit = [num.replace('$', '').replace(' ', '').replace("'", "") for num in df.Profit]

df.Profit # after
0     181,660.60
1     177,954.70
2     169,498.45
3     166,075.80
4     173,176.85
5     201,538.70
6     190,267.00
7     151,039.35
8     197,819.60
9     161,810.55
10    187,298.65
11    196,434.70
Name: Profit, dtype: object

Alas, I have tried the astype(), convert_dtypes() methods but nothing seems to work. What am I missing?

  Month#  Face_Cream  Face_Wash      Moisturizer  Total_Units      Profit
1       2        2090       1390         1720        24600      $177,954.70 
2       3        2280       1280         2020        23390      $169,498.45 
3       4        3340       1890         1550        23020      $166,075.80 
4       5        2820       1550         1860        23960      $173,176.85 

CodePudding user response:

You can cast it directly to float in list comprehension (and replace "," with "", float don't knows ",")

df = pd.DataFrame({'Profit': ['$181,660.60', '$177,954.70', '$169,498.45', '$166,075.80', '$173,176.85', '$201,538.70', '$190,267.00']})
df.Profit = [float((num).replace('$', '').replace(' ', '').replace("'", "").replace(",", "")) for num in df.Profit]


#   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Profit  5 non-null      float64
dtypes: float64(1)

0   181660.60
1   177954.70
2   169498.45
3   166075.80
4   173176.85
