Dealing with greater than and less than values in numeric data when reading csv in pandas-CodePudding

My csv file contains numeric data where some values have greater than or less than symbols e.g. ">244". I want my data type to be a float. When reading the file into pandas:

df = pd.read_csv('file.csv')

I get a warning:

Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.

I have checked this question: Pandas read_csv: low_memory and dtype options and tried specifying the date type of the relevant column with:

df = pd.read_csv('file.csv',dtype={'column':'float'})

However, this gives an error:

ValueError: could not convert string to float: '>244'

I have also tried

df = pd.read_csv('file.csv',dtype={'column':'float'}, error_bad_lines=False)

However this does not solve my problem, and I get the same error above.

My problem appears to be that my data has a mixture of string and floats. Can I ignore any rows containing strings in particular columns when reading in the data?

CodePudding user response：

You can use:

df = pd.read_csv('file.csv', dtype={'column':'str'})

Then:

df['column'] = pd.to_numeric(df['column'], errors='coerce')

CodePudding user response：

I found a workaround which was read in my data

df = pd.read_csv('file.csv')

Then remove any values with '<' or '>'

df = df.loc[df['column'].str[:1] != '<']
df = df.loc[df['column'].str[:1] != '>']

Then convert to numeric with pd.to_numeric

df['column'] = pd.to_numeric(df['column'])