I have difficulties in finding the encoding type of the xlsx file. When I use pd.read_csv(file), it display an error("UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte").
Then I try to create a list of many encoding types to loop through, but still doesn't work.
The data looks like:
PK ! b�h^ � [Content_Types].xml �(� ���N�0E�H�C�-Jܲ@5��Q>�ēƪc[�ii����B�j7���{2��h�nm���ƻR����U^7/���%��rZY�@1__�f� �q��R4D�AJ�h>����V�ƹ�Z�9����NV�8ʩ����ji){^��-I�"{�v^�P!XS)bR�r��K�s(�3�c�0��������7M4����ZƐk �|\|z�(���P��6h_-[�@�!��� Pk���2n�}�?�L��� ��%���d����dN"m,�ǞDO97*�~��ɸ8�O�c|n���E������B��!$}�����;{���[����2� �� PK ! �U0#� L _rels/.rels �(� ��MO�0��H�����ݐBKwAH�!T~�I����$ݿ'T�G�~����<���!��4��;#�w����qu*&r�Fq���v�����GJy(v��*����K��#F��D��.W ��=��Z�MY�b���BS����7��ϛז�� ?�9L�ҙ�sbgٮ|�l!��USh9i�b�r:"y_dl��D���|-N��R"4�2�G�%��Z�4�˝y�7 ë��ɂ��� �� PK ! #K@�� J xl/workbook.xml�U�n�0}_i���;�@TR�I����j{{A�p��
���EU�}�Ҧy�|������l]hɔ�RĘy1�ʌ�E�oo�� F�P��B
�
��l����J�Ǚ�� ��qnL��NsVR}$ &�2����j��J1��1S��y�[R.p�������l"Ӻd´ �� }��Jwhe����ǺrRYV 1�7��2�.B*: ��5 �Z��G<h��$0U�TI-���ݖ���s ���0�C껊-��᎕ ?�*�a�/
��4i5Z� xDv�|<<��ݵ�E��~��f�����\dܰ,��0� ������X�~��;���J���i]�r��?�;A��0%�ac)�p��g5��s G��ߚ ��_��iDg���ժ��8Jn5��������l���%]u�䞍sj�⤓�c��^�;I�ҵu/X3N|�'�7H^����蚦6X.D�����8��N�WF!_N�C���� �ɶ�~ �"����<<���i0�p�A:��h�ƃ� ��d���y���F���ɷB��1�*L?躳/�y�B���>���4���:l��;�V�ERv���\drc����f�j��<39hr��aK����E�Ipb�t,��1�����8��c侢�\�@��h &���o��&��Ȟ�.3Ҥ��QJ���vv������? �� PK ! �>��� � xl/_rels/workbook.xml.rels �(� �RMK�0���0w�v�t/"�U�ɴ)�&!3~��*�]X�K/o�y���v�5��� ��zl�;o���b��G�����s�>��,�8��(%���"D�҆4j�0u2js��MY�˴���S쭂��� �)f���C����y�� I< y ���! ��E���fMy�k�����K�5=|�t ��G)�s墙�U��tB��)���,���f����� �� PK ! )>�\�� CU6 xl/worksheets/sheet1.xml���N�0E�H���}�8���!�!�kי4V��l��B�;� R7Vb'ϙ��\m�&��r��<�) ]�좢/Ϸ�JB��Y������L��/C l�hcW2dF��u
�K�_�����$�Y��cf��tG(�1�4J� 6� ���hU~hF�3�/W�@:�!b����J�����:/�uo�PH��xx������������l���KvɄܓ���C��]��E�k�����v�O�xK�˗ UW�#�\y��A���g|��Vx�I��T����9e�I�W���D1
2��$�Ν[����S;Ƚ�-��I
�X�8s�MձEB���n}j�F����
CodePudding user response:
The read_csv
function expects data in comma-separated values, or CSV, format. Excel saves files to .xlsx
files, which are binary files containing Excel-specific data.
To create a file that this function can read, open the file in Excel and use Save As to save it to a .csv
file. Make sure to keep the original as the CSV file will not contain any formatting (font, color, number format, etc.).
Alternatively, you can use read_excel
as mentioned in aozk's answer.
CodePudding user response:
Why don't you use pandas.read_excel ?
Look at encoding parameter of pandas.read_csv function or use encode() on your string
Or you can use chardet library if you want detect text encoding