I have an access table called "Cell_list" with a key column called "Cell_#". I want to read the table into a dataframe, but only the rows that match indices which are specified in a python list "cell_numbers". I tried several variations on:
import pyodbc
import pandas as pd
cell_numbers = [1,3,7]
cnn_str = r'Driver={Microsoft Access Driver (*.mdb,*.accdb)};DBQ=C:\folder\myfile.accdb;'
conn = pyodbc.connect(cnn_str)
query = ('SELECT * FROM Cell_list WHERE Cell_# in ' tuple(cell_numbers))
df = pd.read_sql(query, conn)
But no matter what I try I get a syntax error. How do I do this?
CodePudding user response:
Convert (join) cell_numbers
to text:
cell_text = '(1,3,7)'
and concatenate this.
The finished SQL should read (you may need brackets around the weird field name Cell_#
):
SELECT * FROM Cell_list WHERE [Cell_#] IN (1,3,7)
CodePudding user response:
Consider best practice of parameterization which is supported in pandas.read_sql
:
# PREPARED STATEMENT, NO DATA
query = (
'SELECT * FROM Cell_list '
'WHERE [Cell_#] IN (?, ?, ?)'
)
# RUN SQL WITH BINDED PARAMS
df = pd.read_sql(query, conn, params=cell_numbers)
Consider even dynamic qmark placeholders dependent on length of cell_numbers
:
qmarks = [', '.join('?' for _ in cell_numbers)]
query = (
'SELECT * FROM Cell_list '
f'WHERE [Cell_#] IN ({qmarks})'
)