Replace non-numeric column values to float using pandas-CodePudding

I am trying to remove all non-number values from a specific column using pandas:

(a) I want to change all the last column values to float. There are some non-zero numeric values in the column that I want to preserve as floats.

(b) If non-numeric values exist, I want to replace them all to 0.0 (float). For example, in line 247, the last column has "a", I would like to change it to 0.0.

I have tried:

def isnumber(x):
    try:
        float(x)
        return True
    except:
        return False

df = pd.read_csv(filename)
df = df[clean_up.applymap(isnumber)]

This however changes every column:

CodePudding user response：

First of all, you should read your csv file with header=None, skipinitialspace=True and quoting=1 as arguments of read_csv because your header looks like a normal row, you have space after each field separator and you have some values enclosed by double quotes.

To convert your last column, use pd.numeric:

df = pd.read_csv('tubedata.csv', header=None, quoting=1, skipinitialspace=True)
df[5] = pd.to_numeric(df[5], errors='coerce').fillna(0)

Output:

# Before conversion
>>> df[5].unique()
array(['0', '3', '2', '4', '6', 'a'], dtype=object)

# After conversion
>>> df[5].unique()
array([0., 3., 2., 4., 6.])

CodePudding user response：

You can apply pd.to_numeric to every column by mapping it over the DataFrame. Using errors='coerce' turns columns that cannot be converted to NaN so they can be easily dropped in the following step:

df.apply(pd.to_numeric, errors='coerce').dropna(axis=1)

To allow for bad values (such as "a") within columns that are otherwise numeric, use fillna instead:

pd.to_numeric(df['some column'], errors='coerce').fillna(0.0)