Home > database >  Converting percent column to float in pandas
Converting percent column to float in pandas

Time:07-14

I am trying to turn a set of columns into a float object but I keep getting a value error. I have tried to use .astype('float') and I still end up with the same error. The below is the code I am using right now.


for column in pct_columns:
    df[column] = df[column].apply(lambda X: float(X.replace('%',''))/100)

This is the error I am getting:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [127], in <cell line: 3>()
      1 pct_columns = ['R_SIG_STR_pct', 'B_SIG_STR_pct', 'R_TD_pct', 'B_TD_pct']
      3 for column in pct_columns:
----> 4     df[column] = df[column].apply(lambda X: float(X.replace('%',''))/100)

File ~/opt/anaconda3/envs/book_env/lib/python3.8/site-packages/pandas/core/series.py:4108, in Series.apply(self, func, convert_dtype, args, **kwds)
   4106     else:
   4107         values = self.astype(object)._values
-> 4108         mapped = lib.map_infer(values, f, convert=convert_dtype)
   4110 if len(mapped) and isinstance(mapped[0], Series):
   4111     # GH 25959 use pd.array instead of tolist
   4112     # so extension arrays can be used
   4113     return self._constructor_expanddim(pd_array(mapped), index=self.index)

File pandas/_libs/lib.pyx:2467, in pandas._libs.lib.map_infer()

Input In [127], in <lambda>(X)
      1 pct_columns = ['R_SIG_STR_pct', 'B_SIG_STR_pct', 'R_TD_pct', 'B_TD_pct']
      3 for column in pct_columns:
----> 4     df[column] = df[column].apply(lambda X: float(X.replace('%',''))/100)

ValueError: could not convert string to float: '---'

CodePudding user response:

You should focus on this exact problem:

ValueError: could not convert string to float: '---'

Two possible approaches would be:

  1. Remove lines containing the value '---' before doing the string to float conversion.
import pandas as pd
df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar', 'foo', '---', 'foo', 'bar', 'foo', 'bar'],
    'B': ['one', '---', 'two', 'three', 'two', 'two', 'one', 'three', 'one', 'three'],
    'C': ['---', '25.84%', '---', '54.32%', '---', '42.73%', '---', '---', '---', '---']
})
df = df[~df.eq('---').any(1)]
  1. Replace '---' values to a value that can be converted to float.
import pandas as pd
df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar', 'foo', '---', 'foo', 'bar', 'foo', 'bar'],
    'B': ['one', '---', 'two', 'three', 'two', 'two', 'one', 'three', 'one', 'three'],
    'C': ['---', '25.84%', '---', '54.32%', '---', '42.73%', '---', '---', '---', '---']
})
df.replace('---', '0.00%', inplace=True)

Of course you can also replace it to a float value like 0.00 for example, but it really depends on your need.

  • Related