Home > OS >  pd.read_csv for a "one-column" import: which sep avoids split as: "ParserError: Error
pd.read_csv for a "one-column" import: which sep avoids split as: "ParserError: Error

Time:10-02

With a csv that has only one column, when running

pd.read_csv('/MYPATH/MYFILE.csv')

I get

ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16

Or the long output:

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in read(self, nrows)
   2155     def read(self, nrows=None):
   2156         try:
-> 2157             data = self._reader.read(nrows)
   2158         except StopIteration:
   2159             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16

Obviously, it does not read the one-column csv as one column, as if the standard separator split the column. Therefore, I made the separator None, but running

pd.read_csv('/MYPATH/MYFILE.csv', sep=None)

throws

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in _alert_malformed(self, msg, row_num)
   2996         """
   2997         if self.error_bad_lines:
-> 2998             raise ParserError(msg)
   2999         elif self.warn_bad_lines:
   3000             base = f"Skipping line {row_num}: "

ParserError: Expected 68 fields in line 26, saw 147

Which delimiter = separator that does not split the column at all?

CodePudding user response:

You need to use a separator that never appears in your data. The separator is only to split the input into columns, not into rows, therefore we can do this:

pd.read_csv('/MYPATH/MYFILE.csv', sep="§§§")

or whatever character(s) that is/are certainly not in the csv. Then, the column will be read as one column, the separator will not detect any needed splits.

Without this, the standard separator is set to sep="," which obviously finds some commas in whatever column of the "one-column" csv.

Credits go to Import CSV into pandas dataframe with list as column.

  • Related