Following is the StringIO
object value
csv_log_stream.getvalue()
Raw Output
'"2022-06-04 12:02:40,248",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage"\n"2022-06-04 12:02:40,252",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,259",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,261",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:40,265",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:43,000",azure_functions_worker,INFO,"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000"\n"2022-06-04 12:02:43,007",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n"2022-06-04 12:02:43,008",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,009",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,010",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,011",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,041",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n \'x-ms-version\': \'REDACTED\'\n \'Accept\': \'application/xml\'\n \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n \'x-ms-date\': \'REDACTED\'\n \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:43,564",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 200\nResponse headers:\n \'Transfer-Encoding\': \'chunked\'\n \'Content-Type\': \'application/xml\'\n \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n \'x-ms-request-id\': \'4a6cab6b-e01e-002d-5ffa-77769c000000\'\n \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\n \'x-ms-version\': \'REDACTED\'\n \'Access-Control-Expose-Headers\': \'REDACTED\'\n \'Access-Control-Allow-Origin\': \'REDACTED\'\n \'Date\': \'Sat, 04 Jun 2022 10:02:43 GMT\'"\n"2022-06-04 12:02:44,070",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n \'x-ms-range\': \'REDACTED\'\n \'x-ms-version\': \'REDACTED\'\n \'Accept\': \'application/xml\'\n \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n \'x-ms-date\': \'REDACTED\'\n \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:44,226",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 206\nResponse headers:\n \'Content-Length\': \'8337358\'\n \'Content-Type\': \'application/json\'\n \'Content-Range\': \'REDACTED\'\n \'Last-Modified\': \'Sat, 04 Jun 2022 01:14:56 GMT\'\n \'Accept-Ranges\': \'REDACTED\'\n \'ETag\': \'""0x8DA45C7A2F73E96""\'\n \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n \'x-ms-request-id\': \'4a6cad87-e01e-002d-3cfa-77769c000000\'\n \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\n \'x-ms-version\': \'REDACTED\'\n \'x-ms-creation-time\': \'REDACTED\'\n \'x-ms-blob-content-md5\': \'REDACTED\'\n \'x-ms-lease-status\': \'REDACTED\'\n \'x-ms-lease-state\': \'REDACTED\'\n \'x-ms-blob-type\': \'REDACTED\'\n \'Content-Disposition\': \'REDACTED\'\n \'x-ms-server-encrypted\': \'REDACTED\'\n \'Access-Control-Expose-Headers\': \'REDACTED\'\n \'Access-Control-Allow-Origin\': \'REDACTED\'\n \'Date\': \'Sat, 04 Jun 2022 10:02:44 GMT\'"\n"2022-06-04 12:09:07,090",root,INFO,Total time taken: 6 minutes and 24 seconds\n'
Reading from StringIO
to pandas.DataFrame
:
df_logs = pd.read_csv(csv_log_stream, header=None)
Output:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "ProjectDir\\.venv\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 933, in __init__
self._engine = self._make_engine(f, self.engine)
File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 1235, in _make_engine
return mapping[engine](f, **self.options)
File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 75, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 551, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
The above attempt to read DataFrame from StringIO
throws error. So, I did the following and am getting empty DataFrame.
df_logs = pd.read_csv(csv_log_stream, names=["Timestamp", "LogName", "LogLevel", "LogMessage"])
print(df_logs)
Output:
Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []
I am not able to understand what am I doing wrong. My input StringIO
value seems to be correct. What am I missing?!!
CodePudding user response:
It might be that you are calling pd.read_csv
on the string which StringIO.getvalue()
outputs instead of the StringIO object itself:
import pandas as pd
from io import StringIO
file = StringIO(
"\"2022-06-04 12:02:40,248\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage\"\n\"2022-06-04 12:02:40,252\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,259\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,261\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:40,265\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:43,000\",azure_functions_worker,INFO,\"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000\"\n\"2022-06-04 12:02:43,007\",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n\"2022-06-04 12:02:43,008\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,009\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,010\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,011\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,041\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n 'x-ms-version': 'REDACTED'\n 'Accept': 'application/xml'\n 'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n 'x-ms-date': 'REDACTED'\n 'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:43,564\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 200\nResponse headers:\n 'Transfer-Encoding': 'chunked'\n 'Content-Type': 'application/xml'\n 'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n 'x-ms-request-id': '4a6cab6b-e01e-002d-5ffa-77769c000000'\n 'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\n 'x-ms-version': 'REDACTED'\n 'Access-Control-Expose-Headers': 'REDACTED'\n 'Access-Control-Allow-Origin': 'REDACTED'\n 'Date': 'Sat, 04 Jun 2022 10:02:43 GMT'\"\n\"2022-06-04 12:02:44,070\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n 'x-ms-range': 'REDACTED'\n 'x-ms-version': 'REDACTED'\n 'Accept': 'application/xml'\n 'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n 'x-ms-date': 'REDACTED'\n 'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:44,226\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 206\nResponse headers:\n 'Content-Length': '8337358'\n 'Content-Type': 'application/json'\n 'Content-Range': 'REDACTED'\n 'Last-Modified': 'Sat, 04 Jun 2022 01:14:56 GMT'\n 'Accept-Ranges': 'REDACTED'\n 'ETag': '\"\"0x8DA45C7A2F73E96\"\"'\n 'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n 'x-ms-request-id': '4a6cad87-e01e-002d-3cfa-77769c000000'\n 'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\n 'x-ms-version': 'REDACTED'\n 'x-ms-creation-time': 'REDACTED'\n 'x-ms-blob-content-md5': 'REDACTED'\n 'x-ms-lease-status': 'REDACTED'\n 'x-ms-lease-state': 'REDACTED'\n 'x-ms-blob-type': 'REDACTED'\n 'Content-Disposition': 'REDACTED'\n 'x-ms-server-encrypted': 'REDACTED'\n 'Access-Control-Expose-Headers': 'REDACTED'\n 'Access-Control-Allow-Origin': 'REDACTED'\n 'Date': 'Sat, 04 Jun 2022 10:02:44 GMT'\"\n\"2022-06-04 12:09:07,090\",root,INFO,Total time taken: 6 minutes and 24 seconds\n"
)
df = pd.read_csv(
file,
names=[
"Timestamp",
"LogName",
"LogLevel",
"LogMessage",
],
)
print(df)
# Output
Timestamp LogName \
0 2022-06-04 12:02:40,248 azure_functions_worker
1 2022-06-04 12:02:40,252 azure_functions_worker
2 2022-06-04 12:02:40,259 azure_functions_worker
3 2022-06-04 12:02:40,261 azure_functions_worker
4 2022-06-04 12:02:40,265 azure_functions_worker
5 2022-06-04 12:02:43,000 azure_functions_worker
6 2022-06-04 12:02:43,007 root
7 2022-06-04 12:02:43,008 root
8 2022-06-04 12:02:43,009 root
9 2022-06-04 12:02:43,010 root
10 2022-06-04 12:02:43,011 root
11 2022-06-04 12:02:43,041 azure.core.pipeline.policies.http_logging_policy
12 2022-06-04 12:02:43,564 azure.core.pipeline.policies.http_logging_policy
13 2022-06-04 12:02:44,070 azure.core.pipeline.policies.http_logging_policy
14 2022-06-04 12:02:44,226 azure.core.pipeline.policies.http_logging_policy
15 2022-06-04 12:09:07,090 root
LogLevel LogMessage
0 INFO Successfully processed FunctionLoadRequest, re...
1 INFO Received FunctionLoadRequest, request ID: 5bc6...
2 INFO Successfully processed FunctionLoadRequest, re...
3 INFO Received FunctionLoadRequest, request ID: 5bc6...
4 INFO Successfully processed FunctionLoadRequest, re...
5 INFO Received FunctionInvocationRequest, request ID...
6 INFO Python HTTP trigger :: ProcessWebSaleExportFil...
7 INFO Processing Request object started for the desi...
8 INFO Processing Request object completed for the de...
9 INFO Processing Request object started for the desi...
10 INFO Processing Request object completed for the de...
11 INFO Request URL: 'https://koxdsrssa.blob.core.wind...
12 INFO Response status: 200\nResponse headers:\n '...
13 INFO Request URL: 'https://koxdsrssa.blob.core.wind...
14 INFO Response status: 206\nResponse headers:\n '...
15 INFO Total time taken: 6 minutes and 24 seconds
Be careful with StringIO objects when accessing their content dynamically, which is not the same as playing with a raw string.
Here is an example with the same "file" object:
file.seek(20000) # Change the stream position to the given byte offset.
df = pd.read_csv(
file,
names=[
"Timestamp",
"LogName",
"LogLevel",
"LogMessage",
],
)
print(df)
# Output
Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []
Whereas:
file.seek(0) # Change the stream position to the beginning of the file
df = pd.read_csv(
file,
names=[
"Timestamp",
"LogName",
"LogLevel",
"LogMessage",
],
)
print(df)
# Ouput
LogLevel LogMessage
0 INFO Successfully processed FunctionLoadRequest, re...
1 INFO Received FunctionLoadRequest, request ID: 5bc6...
2 INFO Successfully processed FunctionLoadRequest, re...
3 INFO Received FunctionLoadRequest, request ID: 5bc6...
4 INFO Successfully processed FunctionLoadRequest, re...
5 INFO Received FunctionInvocationRequest, request ID...
6 INFO Python HTTP trigger :: ProcessWebSaleExportFil...
7 INFO Processing Request object started for the desi...
8 INFO Processing Request object completed for the de...
9 INFO Processing Request object started for the desi...
10 INFO Processing Request object completed for the de...
11 INFO Request URL: 'https://koxdsrssa.blob.core.wind...
12 INFO Response status: 200\nResponse headers:\n '...
13 INFO Request URL: 'https://koxdsrssa.blob.core.wind...
14 INFO Response status: 206\nResponse headers:\n '...
15 INFO Total time taken: 6 minutes and 24 seconds