I am trying to remove all non-number values from a specific column using pandas:
(a) I want to change all the last column values to float. There are some non-zero numeric values in the column that I want to preserve as floats.
(b) If non-numeric values exist, I want to replace them all to 0.0 (float). For example, in line 247, the last column has "a", I would like to change it to 0.0.
I have tried:
def isnumber(x):
try:
float(x)
return True
except:
return False
df = pd.read_csv(filename)
df = df[clean_up.applymap(isnumber)]
This however changes every column:
CodePudding user response:
First of all, you should read your csv file with header=None
, skipinitialspace=True
and quoting=1
as arguments of read_csv
because your header looks like a normal row, you have space after each field separator and you have some values enclosed by double quotes.
To convert your last column, use pd.numeric
:
df = pd.read_csv('tubedata.csv', header=None, quoting=1, skipinitialspace=True)
df[5] = pd.to_numeric(df[5], errors='coerce').fillna(0)
Output:
# Before conversion
>>> df[5].unique()
array(['0', '3', '2', '4', '6', 'a'], dtype=object)
# After conversion
>>> df[5].unique()
array([0., 3., 2., 4., 6.])
CodePudding user response:
You can apply pd.to_numeric
to every column by mapping it over the DataFrame
. Using errors='coerce'
turns columns that cannot be converted to NaN
so they can be easily dropped in the following step:
df.apply(pd.to_numeric, errors='coerce').dropna(axis=1)
To allow for bad values (such as "a"
) within columns that are otherwise numeric, use fillna
instead:
pd.to_numeric(df['some column'], errors='coerce').fillna(0.0)