Home > OS >  Why I can not insert more than 165 rows into a sqlite database asynchronously?
Why I can not insert more than 165 rows into a sqlite database asynchronously?

Time:08-08

I was playing around with aiosqlite. I wrote this code to insert 1000 rows into a database:

import asyncio
import aiosqlite
import inspect
import signal
signal.signal(signal.SIGINT, signal.SIG_DFL)
counter = 1


async def write_to_db(number):
    global counter
    db = await aiosqlite.connect('test_database.db')

    command = '''
                INSERT INTO "main"."person"
                ("first_name", "last_name", "age")
                VALUES ('sample_first_name', 'sample_last_name', 21);
              '''
    print(f'croutine number {number   1} connected to the database.')
    await db.execute(command)
    print(f'before commit in coroutine number: {number   1}')
    # await asyncio.sleep(2)
    await db.commit()
    print(f'commit done in coroutine number: {number   1}')
    await db.close()
    print('rows inserted so far:', counter)
    counter  = 1


async def insert_many_times():
    await asyncio.gather(
        *[write_to_db(number) for number in range(1000)])


async def initialize_db():
    async with aiosqlite.connect('test_database.db') as db2:
        await db2.execute(
            '''
                CREATE TABLE "person" (
                "id"    INTEGER NOT NULL,
                "first_name"    TEXT NOT NULL,
                "last_name" TEXT NOT NULL,
                "age"   INTEGER NOT NULL,
                PRIMARY KEY("id" AUTOINCREMENT)
                );
            '''
        )
        await db2.commit()
    await insert_many_times()


loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(initialize_db())

This code can insert about 165 rows into the database and after that I get the database lock error:

croutine number 1 connected to the database.
croutine number 2 connected to the database.
croutine number 3 connected to the database.

...

croutine number 517 connected to the database.
croutine number 518 connected to the database.
croutine number 519 connected to the database.

...

croutine number 998 connected to the database.
croutine number 999 connected to the database.
croutine number 1000 connected to the database.
before commit in coroutine number: 2
commit done in coroutine number: 2
rows inserted so far: 1
before commit in coroutine number: 855
commit done in coroutine number: 855
rows inserted so far: 2
before commit in coroutine number: 907
commit done in coroutine number: 907
rows inserted so far: 3

...

before commit in coroutine number: 285
commit done in coroutine number: 285
rows inserted so far: 163
before commit in coroutine number: 990
commit done in coroutine number: 990
rows inserted so far: 164
before commit in coroutine number: 387
commit done in coroutine number: 387
rows inserted so far: 165
Traceback (most recent call last):
  File "C:\Users\Ali Safapour\Desktop\1\1\test.py", line 54, in <module>
    loop.run_until_complete(initialize_db())
  File "C:\Users\Ali Safapour\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 646, in run_until_complete

...

  File "C:\Users\Ali Safapour\AppData\Local\Programs\Python\Python310\lib\site-packages\aiosqlite\core.py", line 102, in run
    result = function()
sqlite3.OperationalError: database is locked

As you can see the coroutines has been scheduled in the event loop in order. Because at first, we got the database object in the coroutine number 1, after that we got it in the coroutine number 2 immediately and so on. After gettig database object in all the coroutines from aiosqlite.connect('test_database.db'), we can not see special order in executing sql commands. This is the nature of asyncio: if we need to wait for I/O, we can do something else. If the result is ready, then we can continue. My questions are:

  1. For example in coroutine number 1 which is the first scheduled coroutine in the event loop, why Python did not continue to the rest of coroutine (i.e: running db.execute(command)) in between of connecting other coroutines to the database? obviously the required time for awaiting on db.execute(command) and db.commit() in the first coroutines, is less than awaiting on the coroutine number 1000 (last coroutine that connects to the databse) to get the database object.

  2. As you can see, Python stopped inserting the rows into the database when 165 rows already inserted. If something like this is going to happen (no matter how many times you execute this code, always about 160 rows are going to be inserted), why 165 rows inserted already and there was no problem?

  3. There is an await asyncio.sleep(2) in the write_to_db coroutine. If you uncomment that, you will see only one or two record will be inserted into the database. Why is that so? Can somebody exactly explain that how coroutines are scheduled in the event loop in this example and what is the order of them?

I don't need just a quick fix for this code to work, because I can do it myself. I need someone to explain what is happening exactly in the event loop and why this problem happens.

CodePudding user response:

First, you're gaining nothing from having 1000 connections. Iterating over 1 connection 1000 times is much, much, much faster.

The main advantage of aiosqlite is that it won't block other coroutines while waiting for its queries to execute. Talking to SQLite is I/O, aiosqlite gives you asyncio for SQLite queries.

A SQL database server (PostgreSQL, MySQL, MS SQL, etc...) runs in a separate process, possibly on a different machine, probably multi-threaded. In that case it might make sense to open multiple connections (not 1000); the SQL server can probably handle multiple queries simultaneously, and while that's happening Python can be preparing and executing more. This is the value of asynchronous I/O; while one coroutine is waiting for the SQL server, the other coroutines can proceed. Sending a query to a SQL server is I/O.

But SQLite is not a server. SQLite runs in the same thread as the Python code. If you have 1000 coroutines with 1000 connections doing 1000 queries at the same time they're all competing with the SQLite coroutine to do the work. It's all going to be linear, except now you have the overhead of managing 1000 coroutines with 1000 connections all trying to get an exclusive lock on the same table at the same time. There's a lot of overhead swapping between 1000 coroutines.

Instead, use one connection and run it in a loop. This is far, far faster. We can make it even faster by waiting until all inserts are done before committing; writing to disk is expensive.

async def insert_many_times():
    counter = 1
    async with aiosqlite.connect('test_database.db') as db:
      for number in range(1000):
        command = '''
                    INSERT INTO "main"."person"
                    ("first_name", "last_name", "age")
                    VALUES ('sample_first_name', 'sample_last_name', 21);
                  '''
        print(f'coroutine number {number   1} connected to the database.')
        await db.execute(command)
        print('rows inserted so far:', counter)
        counter  = 1
      await db.commit()

You can tune the connection if necessary.


The coroutines are all running in one thread, why aren't they executing in order? Because they're doing file operations.

When you insert and commit in SQLite, it has to do a bunch of complicated locking and file operations. Let's look at a simple example which illustrates the same behavior: 10 coroutines all opening a file and writing to it.

import asyncio
from aiofile import async_open

async def main():
  await asyncio.gather(
    *[write_to_file(number) for number in range(10)]
  )

async def write_to_file(number):
  print(f'{ number   1 }: running')
  async with async_open(f'/tmp/hello{number}.txt', 'w') as afp:
    print(f'{ number   1 }: opened file')
    await afp.write(f'Hello {number} world')
    print(f'{ number   1 }: wrote to file')

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

When we run this, they all begin in order, but they open and write out of order.

1: running
2: running
3: running
4: running
5: running
6: running
7: running
8: running
9: running
10: running
1: opened file
3: opened file
2: opened file
4: opened file
5: opened file
8: opened file
7: opened file
6: opened file
10: opened file
9: opened file
1: wrote to file
3: wrote to file
2: wrote to file
4: wrote to file
5: wrote to file
8: wrote to file
6: wrote to file
10: wrote to file
7: wrote to file
9: wrote to file

Why?

Because doing I/O takes time. When a coroutine opens or writes a file it has to ask the operating system and then wait for it to be done (known as "blocking"). Instead of waiting, Python asyncio runs another coroutine and eventually gets back to checking the ones which are blocking. While all the coroutines are running in a single thread, the order in which their file operations complete is up to the operating system.

Inserting and committing in SQLite is the same basic issue. While the coroutines and SQLite are all running in a single thread, each commit has to get locks and write to the disk. How long this takes and who gets the lock next is up to the operating system and SQLite's code for resolving lock contention.


With that in mind, let's answer the individual questions.

For example in coroutine number 1 which is the first scheduled coroutine in the event loop, why Python did not continue to the rest of coroutine (i.e: running db.execute(command)) in between of connecting other coroutines to the database?

Probably because connecting doesn't do much.


As you can see, Python stopped inserting the rows into the database when 165 rows already inserted. If something like this is going to happen (no matter how many times you execute this code, always about 160 rows are going to be inserted), why 165 rows inserted already and there was no problem?

Each insert/commit has to get an exclusive lock (gory details here). While it has the lock, nobody else can insert/commit. The default timeout waiting for a lock is 5 seconds. In your case, 165 rows get inserted before a connection times out waiting for a lock.

I get about 400. If I increase the timeout to 10 seconds I get about 450 because the program gets incredibly slow with hundreds of coroutines.

From the Django docs...

SQLite is meant to be a lightweight database, and thus can’t support a high level of concurrency. OperationalError: database is locked errors indicate that your application is experiencing more concurrency than sqlite can handle in default configuration. This error means that one thread or process has an exclusive lock on the database connection and another thread timed out waiting for the lock the be released.

Replace "thread" with "coroutine".


There is an await asyncio.sleep(2) in the write_to_db coroutine. If you uncomment that, you will see only one or two record will be inserted into the database. Why is that so? Can somebody exactly explain that how coroutines are scheduled in the event loop in this example and what is the order of them?

N coroutines all try to get the lock.

A coroutine gets the lock, waits 2 seconds, commits, and releases the lock. Meanwhile, the other coroutines are waiting.

2 seconds have passed.

A coroutine gets the lock, waits 2 seconds, commits, and releases the lock. Meanwhile, the other coroutines are waiting.

4 seconds have passed.

A coroutine gets the lock, and begins waiting for 2 seconds. After 1 second of waiting 5 seconds have passed and one of the waiting coroutines times out.

  • Related