Home > Blockchain >  Pandas: Empty Dataframe when reading from StringIO with read_csv
Pandas: Empty Dataframe when reading from StringIO with read_csv

Time:06-04

Following is the StringIO object value

DebugPoint csv_log_stream type

csv_log_stream.getvalue()

Raw Output

'"2022-06-04 12:02:40,248",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage"\n"2022-06-04 12:02:40,252",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,259",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration"\n"2022-06-04 12:02:40,261",azure_functions_worker,INFO,"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:40,265",azure_functions_worker,INFO,"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage"\n"2022-06-04 12:02:43,000",azure_functions_worker,INFO,"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000"\n"2022-06-04 12:02:43,007",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n"2022-06-04 12:02:43,008",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,009",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,010",root,INFO,Processing Request object started for the desired parameters.\n"2022-06-04 12:02:43,011",root,INFO,Processing Request object completed for the desired parameters.\n"2022-06-04 12:02:43,041",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n    \'x-ms-version\': \'REDACTED\'\n    \'Accept\': \'application/xml\'\n    \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n    \'x-ms-date\': \'REDACTED\'\n    \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:43,564",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 200\nResponse headers:\n    \'Transfer-Encoding\': \'chunked\'\n    \'Content-Type\': \'application/xml\'\n    \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n    \'x-ms-request-id\': \'4a6cab6b-e01e-002d-5ffa-77769c000000\'\n    \'x-ms-client-request-id\': \'79b647c5-e3ed-11ec-8c08-48a4728e3a8b\'\n    \'x-ms-version\': \'REDACTED\'\n    \'Access-Control-Expose-Headers\': \'REDACTED\'\n    \'Access-Control-Allow-Origin\': \'REDACTED\'\n    \'Date\': \'Sat, 04 Jun 2022 10:02:43 GMT\'"\n"2022-06-04 12:02:44,070",azure.core.pipeline.policies.http_logging_policy,INFO,"Request URL: \'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED\'\nRequest method: \'GET\'\nRequest headers:\n    \'x-ms-range\': \'REDACTED\'\n    \'x-ms-version\': \'REDACTED\'\n    \'Accept\': \'application/xml\'\n    \'User-Agent\': \'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)\'\n    \'x-ms-date\': \'REDACTED\'\n    \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\nNo body was attached to the request"\n"2022-06-04 12:02:44,226",azure.core.pipeline.policies.http_logging_policy,INFO,"Response status: 206\nResponse headers:\n    \'Content-Length\': \'8337358\'\n    \'Content-Type\': \'application/json\'\n    \'Content-Range\': \'REDACTED\'\n    \'Last-Modified\': \'Sat, 04 Jun 2022 01:14:56 GMT\'\n    \'Accept-Ranges\': \'REDACTED\'\n    \'ETag\': \'""0x8DA45C7A2F73E96""\'\n    \'Server\': \'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0\'\n    \'x-ms-request-id\': \'4a6cad87-e01e-002d-3cfa-77769c000000\'\n    \'x-ms-client-request-id\': \'7a5398fe-e3ed-11ec-a414-48a4728e3a8b\'\n    \'x-ms-version\': \'REDACTED\'\n    \'x-ms-creation-time\': \'REDACTED\'\n    \'x-ms-blob-content-md5\': \'REDACTED\'\n    \'x-ms-lease-status\': \'REDACTED\'\n    \'x-ms-lease-state\': \'REDACTED\'\n    \'x-ms-blob-type\': \'REDACTED\'\n    \'Content-Disposition\': \'REDACTED\'\n    \'x-ms-server-encrypted\': \'REDACTED\'\n    \'Access-Control-Expose-Headers\': \'REDACTED\'\n    \'Access-Control-Allow-Origin\': \'REDACTED\'\n    \'Date\': \'Sat, 04 Jun 2022 10:02:44 GMT\'"\n"2022-06-04 12:09:07,090",root,INFO,Total time taken: 6 minutes and 24 seconds\n'

Reading from StringIO to pandas.DataFrame:

df_logs = pd.read_csv(csv_log_stream, header=None)

Output:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "ProjectDir\\.venv\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\readers.py", line 1235, in _make_engine
    return mapping[engine](f, **self.options)
  File "ProjectDir\\.venv\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 75, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 551, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

The above attempt to read DataFrame from StringIO throws error. So, I did the following and am getting empty DataFrame.

df_logs = pd.read_csv(csv_log_stream, names=["Timestamp", "LogName", "LogLevel", "LogMessage"])
print(df_logs)

Output:

Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []

I am not able to understand what am I doing wrong. My input StringIO value seems to be correct. What am I missing?!!

CodePudding user response:

It might be that you are calling pd.read_csv on the string which StringIO.getvalue() outputs instead of the StringIO object itself:

import pandas as pd
from io import StringIO

file = StringIO(
    "\"2022-06-04 12:02:40,248\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60,function Name: ProcessWebSaleExportFilesInRSBlobStorage\"\n\"2022-06-04 12:02:40,252\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fcebfunction Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,259\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 0d487a72-2ded-487e-b269-2ac913e3fceb,function Name: ReadIntegrationInterfaceConfiguration\"\n\"2022-06-04 12:02:40,261\",azure_functions_worker,INFO,\"Received FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:40,265\",azure_functions_worker,INFO,\"Successfully processed FunctionLoadRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: c42740c8-21fd-4435-a5cb-7b9f74dc7225,function Name: SaveLogsToRSBlobStorage\"\n\"2022-06-04 12:02:43,000\",azure_functions_worker,INFO,\"Received FunctionInvocationRequest, request ID: 5bc6ee11-9eaa-4479-902a-3e037ac08503, function ID: 5af2b92f-7e82-4515-89c5-846737ba3e60, function name: ProcessWebSaleExportFilesInRSBlobStorage, invocation ID: c42bf678-d155-4859-a71a-b0108645080d, function type: sync, sync threadpool max workers: 1000\"\n\"2022-06-04 12:02:43,007\",root,INFO,Python HTTP trigger :: ProcessWebSaleExportFilesInRSBlobStorage function processed a request.\n\"2022-06-04 12:02:43,008\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,009\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,010\",root,INFO,Processing Request object started for the desired parameters.\n\"2022-06-04 12:02:43,011\",root,INFO,Processing Request object completed for the desired parameters.\n\"2022-06-04 12:02:43,041\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export?restype=REDACTED&comp=REDACTED&prefix=REDACTED&st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n    'x-ms-version': 'REDACTED'\n    'Accept': 'application/xml'\n    'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n    'x-ms-date': 'REDACTED'\n    'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:43,564\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 200\nResponse headers:\n    'Transfer-Encoding': 'chunked'\n    'Content-Type': 'application/xml'\n    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n    'x-ms-request-id': '4a6cab6b-e01e-002d-5ffa-77769c000000'\n    'x-ms-client-request-id': '79b647c5-e3ed-11ec-8c08-48a4728e3a8b'\n    'x-ms-version': 'REDACTED'\n    'Access-Control-Expose-Headers': 'REDACTED'\n    'Access-Control-Allow-Origin': 'REDACTED'\n    'Date': 'Sat, 04 Jun 2022 10:02:43 GMT'\"\n\"2022-06-04 12:02:44,070\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Request URL: 'https://koxdsrssa.blob.core.windows.net/koxds-export/WebSale/Test/2022_06_03_20_13_23_782-0500_c841f873-9a12-4402-a164-5819cbcddc3e_Test_0.json?st=REDACTED&se=REDACTED&sp=REDACTED&sv=REDACTED&sr=REDACTED&sig=REDACTED'\nRequest method: 'GET'\nRequest headers:\n    'x-ms-range': 'REDACTED'\n    'x-ms-version': 'REDACTED'\n    'Accept': 'application/xml'\n    'User-Agent': 'azsdk-python-storage-blob/12.12.0 Python/3.8.12 (Windows-10-10.0.19044-SP0)'\n    'x-ms-date': 'REDACTED'\n    'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\nNo body was attached to the request\"\n\"2022-06-04 12:02:44,226\",azure.core.pipeline.policies.http_logging_policy,INFO,\"Response status: 206\nResponse headers:\n    'Content-Length': '8337358'\n    'Content-Type': 'application/json'\n    'Content-Range': 'REDACTED'\n    'Last-Modified': 'Sat, 04 Jun 2022 01:14:56 GMT'\n    'Accept-Ranges': 'REDACTED'\n    'ETag': '\"\"0x8DA45C7A2F73E96\"\"'\n    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'\n    'x-ms-request-id': '4a6cad87-e01e-002d-3cfa-77769c000000'\n    'x-ms-client-request-id': '7a5398fe-e3ed-11ec-a414-48a4728e3a8b'\n    'x-ms-version': 'REDACTED'\n    'x-ms-creation-time': 'REDACTED'\n    'x-ms-blob-content-md5': 'REDACTED'\n    'x-ms-lease-status': 'REDACTED'\n    'x-ms-lease-state': 'REDACTED'\n    'x-ms-blob-type': 'REDACTED'\n    'Content-Disposition': 'REDACTED'\n    'x-ms-server-encrypted': 'REDACTED'\n    'Access-Control-Expose-Headers': 'REDACTED'\n    'Access-Control-Allow-Origin': 'REDACTED'\n    'Date': 'Sat, 04 Jun 2022 10:02:44 GMT'\"\n\"2022-06-04 12:09:07,090\",root,INFO,Total time taken: 6 minutes and 24 seconds\n"
)


df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)
print(df)
# Output
                  Timestamp                                           LogName  \
0   2022-06-04 12:02:40,248                            azure_functions_worker
1   2022-06-04 12:02:40,252                            azure_functions_worker
2   2022-06-04 12:02:40,259                            azure_functions_worker
3   2022-06-04 12:02:40,261                            azure_functions_worker
4   2022-06-04 12:02:40,265                            azure_functions_worker
5   2022-06-04 12:02:43,000                            azure_functions_worker
6   2022-06-04 12:02:43,007                                              root
7   2022-06-04 12:02:43,008                                              root
8   2022-06-04 12:02:43,009                                              root
9   2022-06-04 12:02:43,010                                              root
10  2022-06-04 12:02:43,011                                              root
11  2022-06-04 12:02:43,041  azure.core.pipeline.policies.http_logging_policy
12  2022-06-04 12:02:43,564  azure.core.pipeline.policies.http_logging_policy
13  2022-06-04 12:02:44,070  azure.core.pipeline.policies.http_logging_policy
14  2022-06-04 12:02:44,226  azure.core.pipeline.policies.http_logging_policy
15  2022-06-04 12:09:07,090                                              root

   LogLevel                                         LogMessage
0      INFO  Successfully processed FunctionLoadRequest, re...
1      INFO  Received FunctionLoadRequest, request ID: 5bc6...
2      INFO  Successfully processed FunctionLoadRequest, re...
3      INFO  Received FunctionLoadRequest, request ID: 5bc6...
4      INFO  Successfully processed FunctionLoadRequest, re...
5      INFO  Received FunctionInvocationRequest, request ID...
6      INFO  Python HTTP trigger :: ProcessWebSaleExportFil...
7      INFO  Processing Request object started for the desi...
8      INFO  Processing Request object completed for the de...
9      INFO  Processing Request object started for the desi...
10     INFO  Processing Request object completed for the de...
11     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...
12     INFO  Response status: 200\nResponse headers:\n    '...
13     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...
14     INFO  Response status: 206\nResponse headers:\n    '...
15     INFO         Total time taken: 6 minutes and 24 seconds

Be careful with StringIO objects when accessing their content dynamically, which is not the same as playing with a raw string.

Here is an example with the same "file" object:

file.seek(20000)  # Change the stream position to the given byte offset.

df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)

print(df)
# Output
Empty DataFrame
Columns: [Timestamp, LogName, LogLevel, LogMessage]
Index: []

Whereas:

file.seek(0)  # Change the stream position to the beginning of the file

df = pd.read_csv(
    file,
    names=[
        "Timestamp",
        "LogName",
        "LogLevel",
        "LogMessage",
    ],
)

print(df)
# Ouput
   LogLevel                                         LogMessage  
0      INFO  Successfully processed FunctionLoadRequest, re...  
1      INFO  Received FunctionLoadRequest, request ID: 5bc6...  
2      INFO  Successfully processed FunctionLoadRequest, re...  
3      INFO  Received FunctionLoadRequest, request ID: 5bc6...  
4      INFO  Successfully processed FunctionLoadRequest, re...  
5      INFO  Received FunctionInvocationRequest, request ID...  
6      INFO  Python HTTP trigger :: ProcessWebSaleExportFil...  
7      INFO  Processing Request object started for the desi...  
8      INFO  Processing Request object completed for the de...  
9      INFO  Processing Request object started for the desi...  
10     INFO  Processing Request object completed for the de...  
11     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...  
12     INFO  Response status: 200\nResponse headers:\n    '...  
13     INFO  Request URL: 'https://koxdsrssa.blob.core.wind...  
14     INFO  Response status: 206\nResponse headers:\n    '...  
15     INFO         Total time taken: 6 minutes and 24 seconds  
  • Related