I have created an Azure Blob Storage Trigger in Azure function in python. A CSV file adds in blob storage and I try to read it with pandas.
import logging
import pandas as pd
import azure.functions as func
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
df_new = pd.read_csv(myblob)
print(df_new.head())
If I pass myblob
to pd.read_csv
, then I get UnsupportedOperation: read1
Python blob trigger function processed blob
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:19:25.650Z] Executed 'Functions.BlobTrigger1' (Failed, Id=2df388f5-a8dc-4554-80fa-f809cfaeedfe, Duration=1472ms)
[2022-11-27T16:19:25.655Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: UnsupportedOperation: read1
If I pass myblob.read()
,
df_new = pd.read_csv(myblob.read())
it gives TypeError: Expected file path name or file-like object, got <class 'bytes'> type
Python blob trigger function processed blob
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:09:56.513Z] Executed 'Functions.BlobTrigger1' (Failed, Id=e3825c28-7538-4e30-bad2-2526f9811697, Duration=1468ms)
[2022-11-27T16:09:56.518Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: TypeError: Expected file path name or file-like object, got <class 'bytes'> type
From Azure functions Docs:
InputStream is File-like object representing an input blob.
From Pandas read_csv Docs:
read_csv takes filepath_or_bufferstr, path object or file-like object
So technically I should read this object. What piece of puzzle am I missing here?
CodePudding user response:
If you refer to this article, it says that this piece of code will work. But this is recommended for smaller files as the entire files goes into memory. Not recommended to be used for larger files.
import logging
import pandas as pd
import azure.functions as func
from io import BytesIO
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
df_new = pd.read_csv(BytesIO(myblob.read()))
print(df_new.head())