Home > OS >  Can not read csv with pandas in azure functions with python
Can not read csv with pandas in azure functions with python

Time:11-28

I have created an Azure Blob Storage Trigger in Azure function in python. A CSV file adds in blob storage and I try to read it with pandas.

import logging
import pandas as pd

import azure.functions as func


def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {myblob.name}\n"
                 f"Blob Size: {myblob.length} bytes")

    df_new = pd.read_csv(myblob)
    print(df_new.head())

If I pass myblob to pd.read_csv, then I get UnsupportedOperation: read1

Python blob trigger function processed blob 
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:19:25.650Z] Executed 'Functions.BlobTrigger1' (Failed, Id=2df388f5-a8dc-4554-80fa-f809cfaeedfe, Duration=1472ms)
[2022-11-27T16:19:25.655Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: UnsupportedOperation: read1

If I pass myblob.read(),

df_new = pd.read_csv(myblob.read())

it gives TypeError: Expected file path name or file-like object, got <class 'bytes'> type

Python blob trigger function processed blob 
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:09:56.513Z] Executed 'Functions.BlobTrigger1' (Failed, Id=e3825c28-7538-4e30-bad2-2526f9811697, Duration=1468ms)
[2022-11-27T16:09:56.518Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: TypeError: Expected file path name or file-like object, got <class 'bytes'> type

From Azure functions Docs:

InputStream is File-like object representing an input blob.

From Pandas read_csv Docs:

read_csv takes filepath_or_bufferstr, path object or file-like object

So technically I should read this object. What piece of puzzle am I missing here?

CodePudding user response:

If you refer to this article, it says that this piece of code will work. But this is recommended for smaller files as the entire files goes into memory. Not recommended to be used for larger files.

import logging
import pandas as pd

import azure.functions as func
from io import BytesIO

def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {myblob.name}\n"
                 f"Blob Size: {myblob.length} bytes")
    df_new = pd.read_csv(BytesIO(myblob.read()))
    print(df_new.head())
  • Related