Home > Software design >  Problem when I am trying to load txt in Jupyter Notebook
Problem when I am trying to load txt in Jupyter Notebook

Time:09-27

Im trying to load all txt files from a folder. This code below works most of time when I want to load txt files to pandas dataframe and concatenate them, but in this case is not working and I don't know why.

Here is the code:

path = 'C:/Users/user/Documents/UNIAO'


csv_files = glob.glob(os.path.join(path, "*.txt"))

list_of_dataframes = []
# loop over the list of csv files
for f in csv_files:
    text_file = open(f, "r", encoding='unicode_escape')

    data = text_file.read()
    separator= data[4]

    df = pd.read_csv(f, sep=separator, encoding ='unicode_escape')
    list_of_dataframes.append(df)

Here is the error message:

ParserError                               Traceback (most recent call last)
Cell In [5], line 19
     16 separator = data[4]
---> 19 df = pd.read_csv(f, sep=separator, encoding ='unicode_escape')
     20 print(f)
     23 list_of_dataframes.append(df)

File c:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File c:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
...
--> 739     raise ParserError(msg)
    740 elif self.on_bad_lines == self.BadLineHandleMethod.WARN:
    741     base = f"Skipping line {row_num}: "

ParserError: Expected 197 fields in line 11955, saw 198

CodePudding user response:

This issue could be due to some corrupted/missing data on line 11955, you could try,

For Pandas >= 1.3.0

df = pd.read_csv(f, sep=separator, encoding ='unicode_escape', on_bad_lines='skip')

For Pandas < 1.3.0

df = pd.read_csv(f, sep=separator, encoding ='unicode_escape', error_bad_lines=False)

Do note that this will cause the offending lines to be skipped.

For more information refer Pandas documentation

  • Related