add r' to a python variable-CodePudding

#create filepath for log files for the specific region
region_log_filepath = join(log_files_folder_path, region)

#files stores file paths
files = [join(region_log_filepath, file) for file in listdir(region_log_filepath) if isfile(join(region_log_filepath, file))]

for file in files :
           if file.endswith('csv'):
               filename = (file.split('Log-')[-1]).split('.csv')[0]
               print(f'\nreading file: {filename}')
               log_file = pd.read_csv(file,encoding='unicode_escape')

The above code gives the error : UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 26850-26851: truncated \UXXXXXXXX escape

I tried looking it up and found a post suggesting to convert it to a raw string. How would I add r' to file in the pd.read_csv() function ?

CodePudding user response：

You can pass encoding parameter in read_csv() method according to the format of the file,

you can try using one of these,

"utf-8",
"ISO-8859-1", 
"latin",
"cp1252"

Syntax: read_csv(file, encoding = "utf-8")

Read https://docs.python.org/3/library/codecs.html#standard-encodings for more.

CodePudding user response：

I tried looking it up and found a post suggesting to convert it to a raw string. How would I add r' to file in the pd.read_csv() function ?

Then you only need to use 'utf-8' encoding in read_csv. It will ignore those Unicode-like sequences and treat it like the characters they are.