I'm trying to read a csv.gz file in python, I read the file with urllib.request.open, then I had two problems, the first one is that the file is in bytes and I need it to be in utf-8 in order to use pandas, the second problem is that I don't precisely understand how can I read this type of file using pandas, I want it to be a dataframe but it is not clear for me the way I can use pandas. This is what I've tried so far, I used decode but I don't trust in that method since the only way it works is because I'm avoiding the erros. At this point I'm not completely sure if it really necessary de decode part.
So I really appreciate any help in the matter, thanks in advance.
CodePudding user response:
df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)