Home > Blockchain >  Error when importing csv files into panda data frame
Error when importing csv files into panda data frame

Time:03-22

I am importing csv files with this code:

# import necessary libraries
from typing import Counter
import pandas as pd
import os
import glob
from datetime import datetime
import numpy as np
 
# in the folder
path = os.path.dirname(os.path.abspath(__file__))

# Delete CSV file 
# first check whether file exists or not
# calling remove method to delete the csv file
# in remove method you need to pass file name and type

del_file = path "\\" 'file_name.csv'

## If file exists, delete it ##
if os.path.isfile(del_file):
    os.remove(del_file)
    print("File deleted")
else:    ## Show an error ##
    print("File not found: "  del_file)

# use glob to get all the csv files
csv_files = glob.glob(os.path.join(path, "*.csv"))
df_list= list()

#format columns
dict_conv={'line_item': lambda x: str(x),
           'column_item': lambda x: str(x)}

# loop over the list of csv files
for f in csv_files:     
    # read the csv file
    df = pd.read_csv(f, sep=";", converters = dict_conv)
    df_list.append(df)
    #print the location and filename
    print('Location:', f)
    print('File Name:', f.split("\\")[-1])

#add data frames to a list 
RLI_combined = pd.concat(df_list, axis=0)

But I get this error when 'file_name.csv' is not in the directory:

File not found: d:\Python\Pandas concat csv files\v_1\file_name.csv
Traceback (most recent call last):
  File "d:\Python\Pandas concat csv files\v_1\from pathlib import Path.py", line 37, in <module>
    df = pd.read_csv(f, sep=";", converters = dict_conv)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine      
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "C:\Users\Anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 69, in __init__   
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas\_libs\parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header
  File "pandas\_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 534: invalid continuation byte

What is happening here the script is sopposed to delete the file so it wont be placed in the data frame with the other csv files. I hope you can point me in the right direction.

CodePudding user response:

Passing latin1 in the encoding parameter, has helped me when it comes to special characters. Kindly try:

df = pd.read_csv(f, sep=";", converters = dict_conv,encoding='latin1')
  • Related