This is a simplification of the case: I have two databases, a MySQL and a MS_Access. I am trying to delete all elements from the MsAccess that are not in the MySQL table but are still in MSAccess. I am using sqlalchemy to connect to both DB. To connect with MSAccess (I know, this database should not be used anymore, this is actually part of a migration process), I am using sqlalchemy-access, that internally works with pyodbc.
The code that does this operation is:
#every row in the mysql table contains a field that references its correspondent row in msaccess
mysql_ids = mysql_session.query(mysql_table.id_msaccess).all()
list_of_ids = [elem(0) for elem in mysql_ids]
delete_query = delete(access_table).where((access_table).id.not_in(list_of_ids))
results = access_session.execute(delete_query)
However, I get this error message:
(pyodbc.ProgrammingError) ('The SQL contains -9972 parameter markers, but 55564 parameters were supplied) DELETE FROM [access_table] WHERE ([access_table].[id] NOT IN (?, ?, ... <here there are all the 55564 parameter markers>) parameters: (241, 242, 243,...)
I have found this issue in pyodbc's github page:
They essentially say that there is a marker counter that overflows in the internal implementation. They are talking about SQL Server but I guess the same thing happens here.
I could do this query in blocks of 32768 rows, or otherwise check for every element from the mysql table to see if it is in the ms-access table (I think this would be quite slow) but I wonder if there is not a better approach. Do you have any suggestions on how could I approach this?
Thanks in advance for any suggestions
CodePudding user response:
I could do this query in blocks of 32768 rows
That won't work for a NOT IN query. Say you had a list of rows to keep:
[1, 2, 3, 4, 5, 6]
If you tried to do that in batches of 3 then the first DELETE would be
DELETE FROM access_table WHERE id NOT IN (1, 2, 3)
which would delete the rows with id values of 4, 5, and 6. Then the next DELETE would be
DELETE FROM access_table WHERE id NOT IN (4, 5, 6)
which would delete the rows with id values of 1, 2, and 3.
However, you could build a list of rows to delete like this:
with mysql_engine.begin() as conn:
mysql_existing = (
conn.scalars(sa.select(mysql_table.c.id_msaccess)).all()
)
print(mysql_existing) # [2, 3]
with access_engine.begin() as conn:
access_existing = (
conn.scalars(sa.select(access_table.c.id)).all()
)
print(access_existing) # [1, 2, 3, 4, 5, 6]
access_to_delete = list(set(access_existing).difference(mysql_existing))
print(access_to_delete) # [1, 4, 5, 6]
and you could process that list in batches by using IN instead of NOT IN.