Home > Software design >  convert object to float fails because '0
convert object to float fails because '0

Time:10-06

array([10.703, 63.756, 18.48, 5.313, 14.553, 6.853, 7.7886, 7.3304, 6.083,
   9.163, 45.276, 100.716, 4.543, 8.2467, 15.3384, 9.7598, 14.3798,
   12.0698, 10.9148, 7.623, 3.8423, 8.4893, 6.9223, 14.0718, 47.0393,
   20.79, 15.939, 7.7, 13.629, "'0.7469", 1.1165, "'0.5352",
   "'0.3889", "'0.1001", "'0.3388", 1.7402, "'0.5313", 1.3052,
   "'0.7893", "'0.5583", 4.8895, 2.3524, 7.6538, "'0.0809", "'0.3504",
   5.8366, 1.0126, 3.7884, 7.5576, 3.0107, "'0.0347", 7.6384, 1.3552,
   4.2928, 1.7364, 1.0857, "'0.7123", 3.1108, "'0.5968", "'0.5544",
   1.6093, "'0.1887", "'0.6661", "'0.6391", 1.3514, "'0.9548"])

Excel automatically adds " ' " to the values below 1. I would like to convert the column into float type,

df['vat_amount']=df['vat_amount'].astype(float)

returns

ValueError: could not convert string to float: "'0.5352"

To erase the " ' " sign before 0, I tried:

df.vat_amount.replace("'0.",",0.")

It returns the same value: '0.5352

How could I properly get rid of the ' sign when the numbers are < 1 ?

All the columns are object type.

As this process would be automated, please suggest only Python solutions, I do not want to go to Excel and replace these values every x period of days.

Could anyone provide the proposed solution via dbfiddle?

CodePudding user response:

The replace function has to be applied for each entry and then written back.

This should work:

df.vat_amount = df.vat_amount.apply(lambda x: x.replace(f"'0.","0."))

Edit: If not all entries in df.vat_amount are string try this:

df.vat_amount = df.vat_amount.apply(lambda x: x.replace(f"'0.","0.") if type(x) is str else x)

CodePudding user response:

The reason it doesnt work is because replace doesn't work as with a standard string in python. In a pandas dataframe, replace it matches the whole value and not just part of it as I understand it.

I would suggest using a regular expression with the pandas replace function to remove the '

import pandas as pd

df = pd.DataFrame({'vat_amount': [10.703, 63.756, 18.48, 5.313, 14.553, 6.853, 7.7886, 7.3304, 6.083,
   9.163, 45.276, 100.716, 4.543, 8.2467, 15.3384, 9.7598, 14.3798,
   12.0698, 10.9148, 7.623, 3.8423, 8.4893, 6.9223, 14.0718, 47.0393,
   20.79, 15.939, 7.7, 13.629, "'0.7469", 1.1165, "'0.5352",
   "'0.3889", "'0.1001", "'0.3388", 1.7402, "'0.5313", 1.3052,
   "'0.7893", "'0.5583", 4.8895, 2.3524, 7.6538, "'0.0809", "'0.3504",
   5.8366, 1.0126, 3.7884, 7.5576, 3.0107, "'0.0347", 7.6384, 1.3552,
   4.2928, 1.7364, 1.0857, "'0.7123", 3.1108, "'0.5968", "'0.5544",
   1.6093, "'0.1887", "'0.6661", "'0.6391", 1.3514, "'0.9548"]})

df = df.replace(to_replace="(['])", value="", regex=True) #For whole dataframe

Or if you just want one column:

df.vat_amount = df.vat_amount.replace(to_replace="(['])", value="", regex=True) 

This effectively removes the value that was giving you problem by searching only the string values, matching the ', and replacing it by nothing. It is a relatively simple one-line solution.

CodePudding user response:

I believe there is a better solution but temporarily you can use something like:

a=([10.703, 63.756, 18.48, 5.313, 14.553, 6.853, 7.7886, 7.3304, 6.083,
   9.163, 45.276, 100.716, 4.543, 8.2467, 15.3384, 9.7598, 14.3798,
   12.0698, 10.9148, 7.623, 3.8423, 8.4893, 6.9223, 14.0718, 47.0393,
   20.79, 15.939, 7.7, 13.629, "'0.7469", 1.1165, "'0.5352",
   "'0.3889", "'0.1001", "'0.3388", 1.7402, "'0.5313", 1.3052,
   "'0.7893", "'0.5583", 4.8895, 2.3524, 7.6538, "'0.0809", "'0.3504",
   5.8366, 1.0126, 3.7884, 7.5576, 3.0107, "'0.0347", 7.6384, 1.3552,
   4.2928, 1.7364, 1.0857, "'0.7123", 3.1108, "'0.5968", "'0.5544",
   1.6093, "'0.1887", "'0.6661", "'0.6391", 1.3514, "'0.9548"])


import pandas as pd
df=pd.DataFrame(a,columns=['value'])

for i,j in df.iterrows():
    
     try:
         if j['value'].find("'") != -1:
              df.loc[i, 'value'] = j['value'].replace("'","")
         else:
             pass
     except:
         pass

df['value']=df['value'].astype(float)

simply, check the value for each row, if there is a quote replace it, do nothing otherwise.

  • Related