Parquet File Encoding - Storing Azure Blob Storage - Failing with Error Error:encoding RLE

I have a pandas dataframe(after pivoted) which i am trying to save the file as parquet and store it in azure blob storage . The file is getting stored in the storage account, however when i try to read the parquet file, i am getting the error - "encoding RLE_DICTIONARY is not supported." Can some one help me how to fix this.

Here is the below code which i am using for saving the dataframe.

blob_service_client = BlobServiceClient.from_connection_string(<<connectionstring>>)
blob_client = blob_service_client.get_blob_client(container='<<container Name>>', blob="department/abc.parquet")
parquet_file = BytesIO()
df.to_parquet(parquet_file,engine='pyarrow',index=False)
parquet_file.seek(0)
blob_client.upload_blob(data=parquet_file)

CodePudding user response：

After reproducing from my end, I could able to make this work by uploading Iterable data using upload_blob. In my case I'm iterating over the Bytes of the file. Below is the code that worked for me.

from azure.storage.blob import BlobServiceClient
import pandas as pd

storage_account_name='<STORAGE_ACCOUNT_NAME>'
storage_account_key='<STORAGE_ACCOUNT_KEY>'
container_name='<CONTAINER_NAME>'
service_client = BlobServiceClient(account_url="{}://{}.blob.core.windows.net/".format("https",storage_account_name), credential=storage_account_key)
    
container_client = service_client.get_container_client(container=container_name)

file = open("userdata5.parquet", "rb")
data = file.read()
file_client = container_client.upload_blob(name="abc.parquet",data=data)
file.close

print("Done Uploading")

RESULTS: enter image description here

abc.parquet

enter image description here

CodePudding user response：

I was able to fix this issue by adding this code. df.to_parquet(parquet_file,engine='pyarrow',index=False,use_dictionary=False)