Can sqlite3_step stream the results of a query from disk in some cases, without loading them all into memory at once?
For example
SELECT * FROM my_table ORDER BY rowid
would this load all the results into memory before the first call to sqlite3_step has returned?
Even if no, are there other cases when all results would be loaded into memory at once? How can this be predicted ahead of time?
My aim here is to understand how SQLite uses memory internally, with a view to using it in a memory constrained environment
CodePudding user response:
After some testing, I think the answer is yes, it can stream results.
In https://github.com/michalc/sqlite-s3-query/, sqlite3_step is used under the hood to fetch results from a SQLite file via range HTTP requests to S3. In https://github.com/michalc/sqlite-s3-query/pull/25/files a custom HTTP client is setup to store how many rows of a result are fetched when each HTTP request is made.
In the above, the query
SELECT my_col_a FROM my_table
Allows the assertion to pass
self.assertEqual(rows_yielded_at_request, [
(0, None),
(0, 'bytes=0-99'),
(0, 'bytes=0-4095'),
(0, 'bytes=24-39'),
(0, 'bytes=4096-8191'),
(0, 'bytes=8192-12287'),
(140, 'bytes=12288-16383'),
(276, 'bytes=16384-20479'),
(412, 'bytes=20480-24575'),
])
shows that yielding of rows to client code is intermingled with the fetching of the raw data from the SQLite file.
And in https://github.com/michalc/sqlite-s3-query/pull/27/files, the query
SELECT my_col_a FROM my_table ORDER BY my_col_a
makes the assertion pass
self.assertEqual(rows_yielded_at_request, [
(0, None),
(0, 'bytes=0-99'),
(0, 'bytes=0-4095'),
(0, 'bytes=24-39'),
(0, 'bytes=4096-8191'),
(0, 'bytes=8192-12287'),
(0, 'bytes=12288-16383'),
(0, 'bytes=16384-20479'),
(0, 'bytes=20480-24575'),
])
showing that in this case it's not streaming. In this case, there is no index on the column my_col_a
, so SQLite has to fetch all the rows before knowing which one could be first.