convert object to float fails because '0-CodePudding

array([10.703, 63.756, 18.48, 5.313, 14.553, 6.853, 7.7886, 7.3304, 6.083,
   9.163, 45.276, 100.716, 4.543, 8.2467, 15.3384, 9.7598, 14.3798,
   12.0698, 10.9148, 7.623, 3.8423, 8.4893, 6.9223, 14.0718, 47.0393,
   20.79, 15.939, 7.7, 13.629, "'0.7469", 1.1165, "'0.5352",
   "'0.3889", "'0.1001", "'0.3388", 1.7402, "'0.5313", 1.3052,
   "'0.7893", "'0.5583", 4.8895, 2.3524, 7.6538, "'0.0809", "'0.3504",
   5.8366, 1.0126, 3.7884, 7.5576, 3.0107, "'0.0347", 7.6384, 1.3552,
   4.2928, 1.7364, 1.0857, "'0.7123", 3.1108, "'0.5968", "'0.5544",
   1.6093, "'0.1887", "'0.6661", "'0.6391", 1.3514, "'0.9548"])

Excel automatically adds " ' " to the values below 1. I would like to convert the column into float type,

df['vat_amount']=df['vat_amount'].astype(float)

returns

ValueError: could not convert string to float: "'0.5352"

To erase the " ' " sign before 0, I tried:

df.vat_amount.replace("'0.",",0.")

It returns the same value: '0.5352

How could I properly get rid of the ' sign when the numbers are < 1 ?

All the columns are object type.

As this process would be automated, please suggest only Python solutions, I do not want to go to Excel and replace these values every x period of days.

Could anyone provide the proposed solution via dbfiddle?

CodePudding user response：

The replace function has to be applied for each entry and then written back.

This should work:

df.vat_amount = df.vat_amount.apply(lambda x: x.replace(f"'0.","0."))

Edit: If not all entries in df.vat_amount are string try this:

df.vat_amount = df.vat_amount.apply(lambda x: x.replace(f"'0.","0.") if type(x) is str else x)

CodePudding user response：

The reason it doesnt work is because replace doesn't work as with a standard string in python. In a pandas dataframe, replace it matches the whole value and not just part of it as I understand it.

I would suggest using a regular expression with the pandas replace function to remove the '

import pandas as pd

df = pd.DataFrame({'vat_amount': [10.703, 63.756, 18.48, 5.313, 14.553, 6.853, 7.7886, 7.3304, 6.083,
   9.163, 45.276, 100.716, 4.543, 8.2467, 15.3384, 9.7598, 14.3798,
   12.0698, 10.9148, 7.623, 3.8423, 8.4893, 6.9223, 14.0718, 47.0393,
   20.79, 15.939, 7.7, 13.629, "'0.7469", 1.1165, "'0.5352",
   "'0.3889", "'0.1001", "'0.3388", 1.7402, "'0.5313", 1.3052,
   "'0.7893", "'0.5583", 4.8895, 2.3524, 7.6538, "'0.0809", "'0.3504",
   5.8366, 1.0126, 3.7884, 7.5576, 3.0107, "'0.0347", 7.6384, 1.3552,
   4.2928, 1.7364, 1.0857, "'0.7123", 3.1108, "'0.5968", "'0.5544",
   1.6093, "'0.1887", "'0.6661", "'0.6391", 1.3514, "'0.9548"]})

df = df.replace(to_replace="(['])", value="", regex=True) #For whole dataframe

Or if you just want one column:

df.vat_amount = df.vat_amount.replace(to_replace="(['])", value="", regex=True)

This effectively removes the value that was giving you problem by searching only the string values, matching the ', and replacing it by nothing. It is a relatively simple one-line solution.

CodePudding user response：

I believe there is a better solution but temporarily you can use something like:

a=([10.703, 63.756, 18.48, 5.313, 14.553, 6.853, 7.7886, 7.3304, 6.083,
   9.163, 45.276, 100.716, 4.543, 8.2467, 15.3384, 9.7598, 14.3798,
   12.0698, 10.9148, 7.623, 3.8423, 8.4893, 6.9223, 14.0718, 47.0393,
   20.79, 15.939, 7.7, 13.629, "'0.7469", 1.1165, "'0.5352",
   "'0.3889", "'0.1001", "'0.3388", 1.7402, "'0.5313", 1.3052,
   "'0.7893", "'0.5583", 4.8895, 2.3524, 7.6538, "'0.0809", "'0.3504",
   5.8366, 1.0126, 3.7884, 7.5576, 3.0107, "'0.0347", 7.6384, 1.3552,
   4.2928, 1.7364, 1.0857, "'0.7123", 3.1108, "'0.5968", "'0.5544",
   1.6093, "'0.1887", "'0.6661", "'0.6391", 1.3514, "'0.9548"])


import pandas as pd
df=pd.DataFrame(a,columns=['value'])

for i,j in df.iterrows():
    
     try:
         if j['value'].find("'") != -1:
              df.loc[i, 'value'] = j['value'].replace("'","")
         else:
             pass
     except:
         pass

df['value']=df['value'].astype(float)

simply, check the value for each row, if there is a quote replace it, do nothing otherwise.