#create filepath for log files for the specific region
region_log_filepath = join(log_files_folder_path, region)
#files stores file paths
files = [join(region_log_filepath, file) for file in listdir(region_log_filepath) if isfile(join(region_log_filepath, file))]
for file in files :
if file.endswith('csv'):
filename = (file.split('Log-')[-1]).split('.csv')[0]
print(f'\nreading file: {filename}')
log_file = pd.read_csv(file,encoding='unicode_escape')
The above code gives the error : UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 26850-26851: truncated \UXXXXXXXX escape
I tried looking it up and found a post suggesting to convert it to a raw string. How would I add r' to file in the pd.read_csv() function ?
CodePudding user response:
You can pass encoding parameter in read_csv()
method according to the format of the file,
you can try using one of these,
"utf-8",
"ISO-8859-1",
"latin",
"cp1252"
Syntax: read_csv(file, encoding = "utf-8")
Read https://docs.python.org/3/library/codecs.html#standard-encodings for more.
CodePudding user response:
I tried looking it up and found a post suggesting to convert it to a raw string. How would I add r' to file in the pd.read_csv() function ?
Then you only need to use 'utf-8' encoding in read_csv. It will ignore those Unicode-like sequences and treat it like the characters they are.