Home > Back-end >  "COUNT field incorrect or syntax error" when passing a long list of values to `.in_()`
"COUNT field incorrect or syntax error" when passing a long list of values to `.in_()`

Time:01-06

I am trying to add a very basic SQL query in my flask API project. I am using SQLAlchemy as the database manipulation tool.

The query I want to run is the following:

SELECT * from trip_metadata where trip_id in ('trip_id_1', 'trip_id_2', ..., 'trip_id_n')

So, in my code, I wrote:

trips_ids = ['trip_id_1', 'trip_id_2', ..., 'trip_id_n']
result = session.query(dal.trip_table).filter(dal.trip_table.columns.trip_id.in_(trips_ids)).all()

When n is low, let'say n=10, it works very well. I get the expected result. However, when n is high, let's say n > 1000, it crashes. I am very surprised as it seems usual to put many values in the filter.

from sqlalchemy import text
result = session.execute(text(f"SELECT * FROM trip_metadata where trip_id in {trip_ids_tuple}"))

The error log is:

sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('07002', '[07002] [Microsoft][ODBC Driver 17 for SQL Server]COUNT field incorrect or syntax error (0) (SQLExecDirectW)')')
[SQL: SELECT * FROM trip_metadata 
WHERE trip_metadata.trip_id IN (?, ?, ..., ?)]
[parameters: ('ABC12345-XXXX-XXXX-XXXX-000000000000', 'DEF12345-XXXX-XXXX-XXXX-000000000000', ..., 'GHI12345-XXXX-XXXX-XXXX-000000000000')]
(Background on this error at: https://sqlalche.me/e/14/dbapi)
127.0.0.1 - - [05/Jan/2023 10:35:48] "POST /api/v1/tripsAggregates HTTP/1.1" 500 -

However when I write the raw request, it works well, even when n is very high:

from sqlalchemy import text
trip_ids_tuple = ('trip_id_1', 'trip_id_2', ..., 'trip_id_n')
result = session.execute(text(f"SELECT * FROM trip_metadata where trip_id in {trip_ids_tuple}"))

But I don't think this is a good way of doing because I have much more complex requests to write and using sqlalchemy filters is more adapted.

Do you have any idea to fix my issue keeping using sqlalchemy library ? Thank you very much

CodePudding user response:

From the error, it could indicate a formatting issue (escape string characters properly?). This can happen when N is small, the data that breaks the formatting has low chance of occurring. When N gets large, there's more likelihood there is "bad data" that sqlalchemy tries to put into the query. Can't say for certain here, it might be a memory or operating system issue too.

  1. First thing to ask here, do you need to have tuples provided externally into the query? Is there a way to query for the the trip_ids via a join? Usually it's best to push operations to the SQL engine, but this isn't always possible if you're getting the tuples/lists of id's elsewhere.

  2. Rule out if there's a data issue that errors out during the execute(). Look into escaping the string values. You can try chunking the list into smaller bits to narrow down the potentially problematic values (see appendix below)

Try a different way to string format the query.

sql = f"SELECT * FROM trip_metadata where trip_id in ({','.join(trip_ids_tuple)})"

sql output str value:

'SELECT * FROM trip_metadata where trip_id in (trip_id_1,trip_id_2,trip_id_n)'

Appendix:

You can potentially build in a chunker mechanism to break. The crashing might be a operating system or memory issue. For Example, you can use list slicing:

def chunker(seq, size):
    return (seq[pos:pos   size] for pos in range(0, len(seq), size))

# Example usage
my_list = [1, 2, 3, 4, 5, 6, 7, 8]
chunk_size = 3
for chunk in chunker(my_list, chunk_size):
    print(chunk)

output

[1, 2, 3]
[4, 5, 6]
[7, 8]

Use a chunk size where N is manageable. This will also help narrow down potentially problematic str values that errors out.

CodePudding user response:

Microsoft's ODBC drivers for SQL Server execute statements using a system stored procedure on the server (sp_prepexec or sp_prepare). Stored procedures on SQL Server are limited to 2100 parameter values, so with a model like

class Trip(Base):
    __tablename__ = "trip"
    id = Column(String(32), primary_key=True)

this code will work

with Session(engine) as session:
    trips_ids = ["trip_id_1", "trip_id_2"]

    q = session.query(Trip).where(Trip.id.in_(trips_ids))
    results = q.all()
    """SQL emitted:
    SELECT trip.id AS trip_id 
    FROM trip 
    WHERE trip.id IN (?, ?)
    [generated in 0.00092s] ('trip_id_1', 'trip_id_2')
    """

because it only has two parameter values. If the length of the trips_ids list is increased to thousands of values the code will eventually fail.

One way to avoid the issue is to have SQLAlchemy construct an IN clause with literal values instead of parameter placeholders:

    q = session.query(Trip).where(
        Trip.id.in_(bindparam("p1", expanding=True, literal_execute=True))
    )
    results = q.params(p1=trips_ids).all()
    """SQL emitted:
    SELECT trip.id AS trip_id 
    FROM trip 
    WHERE trip.id IN ('trip_id_1', 'trip_id_2')
    [generated in 0.00135s] ()
    """
  • Related