Home > database >  How to do this Not In operation without triggering an overflow in the marker amount of operations in
How to do this Not In operation without triggering an overflow in the marker amount of operations in

Time:11-24

This is a simplification of the case: I have two databases, a MySQL and a MS_Access. I am trying to delete all elements from the MsAccess that are not in the MySQL table but are still in MSAccess. I am using sqlalchemy to connect to both DB. To connect with MSAccess (I know, this database should not be used anymore, this is actually part of a migration process), I am using sqlalchemy-access, that internally works with pyodbc.

The code that does this operation is:

#every row in the mysql table contains a field that references its correspondent row in msaccess
mysql_ids = mysql_session.query(mysql_table.id_msaccess).all()
list_of_ids = [elem(0) for elem in mysql_ids]

delete_query = delete(access_table).where((access_table).id.not_in(list_of_ids))
results = access_session.execute(delete_query)

However, I get this error message:

(pyodbc.ProgrammingError) ('The SQL contains -9972 parameter markers, but 55564 parameters were supplied) DELETE FROM [access_table] WHERE ([access_table].[id] NOT IN (?, ?, ... <here there are all the 55564 parameter markers>) parameters: (241, 242, 243,...)

I have found this issue in pyodbc's github page:

Github Issue in Pyodbc

They essentially say that there is a marker counter that overflows in the internal implementation. They are talking about SQL Server but I guess the same thing happens here.

I could do this query in blocks of 32768 rows, or otherwise check for every element from the mysql table to see if it is in the ms-access table (I think this would be quite slow) but I wonder if there is not a better approach. Do you have any suggestions on how could I approach this?

Thanks in advance for any suggestions

CodePudding user response:

I could do this query in blocks of 32768 rows

That won't work for a NOT IN query. Say you had a list of rows to keep:

[1, 2, 3, 4, 5, 6]

If you tried to do that in batches of 3 then the first DELETE would be

DELETE FROM access_table WHERE id NOT IN (1, 2, 3)

which would delete the rows with id values of 4, 5, and 6. Then the next DELETE would be

DELETE FROM access_table WHERE id NOT IN (4, 5, 6)

which would delete the rows with id values of 1, 2, and 3.

However, you could build a list of rows to delete like this:

with mysql_engine.begin() as conn:
    mysql_existing = (
        conn.scalars(sa.select(mysql_table.c.id_msaccess)).all()
    )
    print(mysql_existing)  # [2, 3]

with access_engine.begin() as conn:
    access_existing = (
        conn.scalars(sa.select(access_table.c.id)).all()
    )
    print(access_existing)  # [1, 2, 3, 4, 5, 6]

access_to_delete = list(set(access_existing).difference(mysql_existing))
print(access_to_delete)  # [1, 4, 5, 6]

and you could process that list in batches by using IN instead of NOT IN.

  • Related