With a csv that has only one column, when running
pd.read_csv('/MYPATH/MYFILE.csv')
I get
ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16
Or the long output:
/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in read(self, nrows)
2155 def read(self, nrows=None):
2156 try:
-> 2157 data = self._reader.read(nrows)
2158 except StopIteration:
2159 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 10 fields in line 4, saw 16
Obviously, it does not read the one-column csv as one column, as if the standard separator split the column. Therefore, I made the separator None
, but running
pd.read_csv('/MYPATH/MYFILE.csv', sep=None)
throws
/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in _alert_malformed(self, msg, row_num)
2996 """
2997 if self.error_bad_lines:
-> 2998 raise ParserError(msg)
2999 elif self.warn_bad_lines:
3000 base = f"Skipping line {row_num}: "
ParserError: Expected 68 fields in line 26, saw 147
Which delimiter = separator that does not split the column at all?
CodePudding user response:
You need to use a separator that never appears in your data. The separator is only to split the input into columns, not into rows, therefore we can do this:
pd.read_csv('/MYPATH/MYFILE.csv', sep="§§§")
or whatever character(s) that is/are certainly not in the csv. Then, the column will be read as one column, the separator will not detect any needed splits.
Without this, the standard separator is set to sep=","
which obviously finds some commas in whatever column of the "one-column" csv.
Credits go to Import CSV into pandas dataframe with list as column.