Home > Back-end >  ParseException Syntax error when using Python ODBC with Cloudera Impala ODBC driver on Ubuntu
ParseException Syntax error when using Python ODBC with Cloudera Impala ODBC driver on Ubuntu

Time:05-09

We have a Python 3.7 application running on an AWS EC2 instance (Amazon Linux) that performs SQL queries against a Cloudera Impala service using pyodbc (4.0.27) and the Cloudera Impala ODBC driver (installed using ClouderaImpalaODBC-2.6.5.rpm). This application has been running successfully for several years.

I'm currently trying to get the application running in a Docker container running Ubuntu 18.04.4 LTS, but having trouble with the following error when running even the most basic query (e.g. SELECT 'HELLO'):

Error: ('HY000', '[HY000] [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : ParseException: Syntax error in line 1:\\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\\n^\\nEncountered: Unexpected character\\nExpected: ALTER, COMMENT, COMPUTE, COPY, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH\\n\\nCAUSED BY: Exception: Syntax error\\n\\x00\u6572\u3a64\u5520\u656e\u7078\u6365\u6574\\u2064\u6863\u7261\u6361\u6574\u0a72 (110) (SQLExecDirectW)')"}

Needless to say this looks like a string encoding problem.

Some context housekeeping:

  • the python code on both systems (Amazon Linux / Ubuntu) is identical
  • the Impala ODBC driver installations on both systems have the same version (2.6.5); the Impala ODBC driver for Ubuntu was downloaded directly from the Cloudera website (https://www.cloudera.com/downloads/connectors/impala/odbc/2-6-5.html)
  • the Impala ODBC connection params are identical except for the OS specific items:
    • "HOST": "[host]"
    • "PORT": 21050
    • "Database": "[database]
    • "UID": "[username]"
    • "PWD": "[password]"
    • "Driver": "{/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so}"
    • "UseSASL": 1
    • "AuthMech": 3
    • "SSL": 1
    • "CAIssuedCertNamesMismatch": 1
    • "TrustedCerts": "[path_to_certs_file]"
    • "TSaslTransportBufSize": 1000
    • "RowsFetchedPerBlock": 10000
    • "SocketTimeout": 0
    • "StringColumnLength": 32767
    • "UseNativeQuery": 0
  • The application appears to be connecting successfully to Impala as there is no error calling pyodbc.connect(**config, autocommit=True) or getting the cursor from the connection (have tried with invalid creds to make sure, and get the usual connection errors when creds are wrong). The details of the error message indicate the correct ODBC driver is being used

I have tried playing around with different values for the Impala ODBC driver param "DriverManagerEncoding" such as "UTF-16", "UTF-32" or not having it at all (which is the case for the Amazon Linux setup) but always get the same error.

I also tried using the odbclinux tool isql on both system to try troubleshooting that way; was able to connect successfully from Amazon Linux system, but could never connect on Ubuntu - consistently get the following (not sure if this is related or some other issue):

iusql -v [DSN]
[unixODBC][
[ISQL]ERROR: Could not SQLDriverConnect

CodePudding user response:

Found the culprit - is was the setting DriverManagerEncoding in /opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini:

[Driver]

## - Note that this default DriverManagerEncoding of UTF-32 is for iODBC.
## - unixODBC uses UTF-16 by default.
## - If unixODBC was compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the correct value.
##   Execute 'odbc_config --cflags' to determine if you need UTF-32 or UTF-16 on unixODBC
## - SimbaDM can be used with UTF-8 or UTF-16.
##   The DriverUnicodeEncoding setting will cause SimbaDM to run in UTF-8 when set to 2 or UTF-16 when set to 1.

DriverManagerEncoding=UTF-32
ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/
LogLevel=0
LogPath=
SwapFilePath=/tmp


## - Uncomment the ODBCInstLib corresponding to the Driver Manager being used.
## - Note that the path to your ODBC Driver Manager must be specified in LD_LIBRARY_PATH (LIBPATH for AIX).
## - Note that AIX has a different format for specifying its shared libraries.

# Generic ODBCInstLib
#   iODBC
# ODBCInstLib=libiodbcinst.so

#   SimbaDM / unixODBC
#ODBCInstLib=libodbcinst.so

# AIX specific ODBCInstLib
#   iODBC
#ODBCInstLib=libiodbcinst.a(libiodbcinst.so.2)

#   SimbaDM
#ODBCInstLib=libodbcinst.a(odbcinst.so)

#   unixODBC
ODBCInstLib=libodbcinst.a(libodbcinst.so.1)

This file was autogenerated as part of the installation of the driver. Note the comments about iODBC vs unixODBC - we have installed only the later.

Once I commented that configuration out, our python app worked. It also fixed the problem with iusql (which is part of the unixODBC install).

Bonus content:

I had also come across a problem with iqsl (not iusql) - was getting this error/output for the command isql -v [DSN]:

[S1000][unixODBC][Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.
[ISQL]ERROR: Could not SQLConnect

The error is related to the config param ODBCInstLib in the same ini file. Once I changed it from the default libodbcinst.a(libodbcinst.so.1) to /usr/lib/x86_64-linux-gnu/libodbcinst.so it worked. Found the answer was in this post, which actually helped solving my original problems:

Can't connect to snowflake via unixODBC. Error: [S1000][unixODBC][Snowflake][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function

  • Related