Errors reading CSV with Pandas-CodePudding

I have a dataset of 100 million rows that I need to analyze. I use this function to read the file:

csv2020=pd.read_csv('filename.txt',
                    sep="\t",
                    error_bad_lines=False,
                    usecols=['field1', 'field2', 'field3', 'field4'],
                    dtype={'field1': int,'field2': float, 'field3': float, 'field4': float})

But I'm getting an error about one of the lines not possible to convert to a float:

ValueError: could not convert string to float: 'ORCH'

I would like to omit any lines where this error occurs, but I don't know how besides the error-bad-lines argument. Help?

Thanks!

CodePudding user response：

Some of the columns you are trying to import as float has strings and therefore cannot be converted.

Read the CSV first without the "dtype...." and look at your dataframe

CodePudding user response：

The error_bad_lines option is not for this purpose, it only applies to an incorrect number of fields.

Read your file without the dtype option and do the conversion afterwards using pandas.to_numeric with the errors='coerce' option:

df = pd.read_csv(…)
df['field1'] = pd.to_numeric(df['field1'], errors='coerce')
df['field2'] = …