Home > OS >  How to choose encoding type for the read_csv of pandas
How to choose encoding type for the read_csv of pandas

Time:09-26

I have difficulties in finding the encoding type of the xlsx file. When I use pd.read_csv(file), it display an error("UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte"). Then I try to create a list of many encoding types to loop through, but still doesn't work. The data looks like: PK ! b�h^ �  [Content_Types].xml �(� ���N�0E�H�C�-Jܲ@5��Q>�ēƪc[�ii����B�j7���{2��h�nm���ƻR ����U^7/���%��rZY�@1__�f� �q��R4D�AJ�h>����V�ƹ �Z�9����NV�8ʩ����ji){^��-I�"{�v^�P!XS)bR�r��K�s(�3�c�0��������7M4����ZƐk �|\|z�(���P��6h_-[�@�!��� Pk���2n�}�?�L��� ��%���d����dN"m,�ǞDO97*�~��ɸ8�O�c|n���E������B��!$}�����;{���[����2� �� PK ! �U0#� L _rels/.rels �(� ��MO�0 ��H�����ݐBKwAH�!T~�I����$ݿ'T�G�~����<���!��4��;#�w����qu*&r�Fq���v�����GJy(v��*����K��#F��D��.W ��=��Z�MY�b���BS����7��ϛז�� ?�9L�ҙ�sbgٮ|�l!��USh9i�b�r:"y_dl��D���|-N��R"4�2�G�%��Z�4�˝y�7 ë��ɂ��� �� PK ! #K@�� J  xl/workbook.xml�U�n�0}_i���;�@TR�I����j{{A�p�����EU�}�Ҧy�|������l]hɔ�RĘy1�ʌ�E�oo�� F�P��B � ��l����J�Ǚ�� ��qnL��NsVR}$ &�2����j��J1��1S��y�[R.p�������l"Ӻd´ �� }��Jwhe����ǺrRYV 1�7��2�.B*: ��5 �Z��G<h��$0U�TI-���ݖ���s � ��0�C껊-��᎕ ?�*�a�/��4i5Z� xD v�|<<��ݵ�E��~��f�����\dܰ,��0� ������X�~��;���J���i]�r��?�;A��0%�ac) �p��g5��s G��ߚ ��_��iDg���ժ��8Jn5��������l���%]u�䞍sj�⤓�c��^�;I�ҵu/X3N|�'�7H^����蚦6X.D�����8��N�WF!_N�C���� �ɶ�~ �"����<<���i0�p�A:��h�ƃ� ��d���y���F���ɷB��1�*L?躳/�y�B���>���4���:l��;�V�ERv���\drc����f�j��<39hr��aK����E�Ipb�t,��1�����8��c侢�\�@��h &���o��&��Ȟ�.3Ҥ��QJ���vv������? �� PK ! �>��� � xl/_rels/workbook.xml.rels �(� �RMK�0� ��0w�v�t/"�U�ɴ)�&!3~��*�]X�K/o�y���v�5��� ��zl�;o���b��G���� �s�>��,�8��(%���"D�҆4j�0u2js��MY�˴���S쭂��� �)f���C����y�� I< y ���! ��E���fMy�k��� ��K�5=|�t ��G)�s墙�U��tB��)���,���f����� �� PK ! )>�\�� CU6 xl/worksheets/sheet1.xml���N�0E�H���}�8���!�!�kי4V��l��B�;� R7Vb'ϙ��\m�&��r��<�) ]�좢/Ϸ� JB��Y������L��/C  l�hcW2d F��u�K�_���΃��$�Y��cf��tG(�1 �4J� 6� ���hU~hF�3�/W�@:�!b����J�����:/�uo�PH��xx����������� �l���KvɄܓ���C��]��E�k�����v�O�xK�˗ UW�#�\y��A���g|��Vx�I��T����9e�I�W���D1 2��$�Ν[����S;Ƚ�-��I �X�8s�MձEB���n}j�F����

CodePudding user response:

The read_csv function expects data in comma-separated values, or CSV, format. Excel saves files to .xlsx files, which are binary files containing Excel-specific data.

To create a file that this function can read, open the file in Excel and use Save As to save it to a .csv file. Make sure to keep the original as the CSV file will not contain any formatting (font, color, number format, etc.).

Alternatively, you can use read_excel as mentioned in aozk's answer.

CodePudding user response:

Why don't you use pandas.read_excel ?

Look at encoding parameter of pandas.read_csv function or use encode() on your string

Or you can use chardet library if you want detect text encoding

  • Related