Home > Enterprise >  Dealing with greater than and less than values in numeric data when reading csv in pandas
Dealing with greater than and less than values in numeric data when reading csv in pandas

Time:08-02

My csv file contains numeric data where some values have greater than or less than symbols e.g. ">244". I want my data type to be a float. When reading the file into pandas:

df = pd.read_csv('file.csv')

I get a warning:

Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.

I have checked this question: Pandas read_csv: low_memory and dtype options and tried specifying the date type of the relevant column with:

df = pd.read_csv('file.csv',dtype={'column':'float'})

However, this gives an error:

ValueError: could not convert string to float: '>244'

I have also tried

df = pd.read_csv('file.csv',dtype={'column':'float'}, error_bad_lines=False)

However this does not solve my problem, and I get the same error above.

My problem appears to be that my data has a mixture of string and floats. Can I ignore any rows containing strings in particular columns when reading in the data?

CodePudding user response:

You can use:

df = pd.read_csv('file.csv', dtype={'column':'str'})

Then:

df['column'] = pd.to_numeric(df['column'], errors='coerce')

CodePudding user response:

I found a workaround which was read in my data

df = pd.read_csv('file.csv')

Then remove any values with '<' or '>'

df = df.loc[df['column'].str[:1] != '<']
df = df.loc[df['column'].str[:1] != '>']

Then convert to numeric with pd.to_numeric

df['column'] = pd.to_numeric(df['column'])
  • Related