Home > Software engineering >  Adjust column values for insert into DataFrame
Adjust column values for insert into DataFrame

Time:03-01

At my company we have a SQL database that we connect to via pyodbc. Here, when we run a query, pyodbc returns a list of tuples. Typically, such a list contains >10 tuples. Here is an example of a possible output we get:

OUTPUT =
[(datetime.datetime(2003, 3, 26, 15, 12, 15), '490002_space'),
 (datetime.datetime(2003, 3, 27, 16, 13, 14), '490002_space')] 

My wish is to remove '_space' and 'datetime.datetime(...)' parts in all tuples we receive in OUTPUT. Eventually I hope to pass the new list of tuples to a pandas dataframe. I was hoping to get any of your advice on how to efficiently modify OUTPUT to DESIRED_OUTPUT:

DESIRED_OUTPUT:
[('2003, 3, 26, 15, 12, 15', '490002'),
 ('2003, 3, 27, 16, 13, 14', '490002')] 

Really hope to hear from you.

Greetings, Jerome

CodePudding user response:

Here is a possible solution:

result = [(str(dt.timetuple()[:6])[1:-1], s.split('_')[0]) for dt, s in OUTPUT]

CodePudding user response:

Eventually I hope to pass the new list of tuples to a pandas dataframe.

You can use .read_sql_query() to pull the information directly into a DataFrame:

import pandas as pd
import sqlalchemy as sa

connection_url = sa.engine.URL.create(
    "mssql pyodbc",
    username="scott",
    password="tiger^5HHH",
    host="192.168.0.199",
    database="test",
    query={
        "driver": "ODBC Driver 18 for SQL Server",
        "TrustServerCertificate": "Yes",
    }
)

engine = sa.create_engine(connection_url)

table_name = "so71297370"

# set up example environment
with engine.begin() as conn:
    conn.exec_driver_sql(f"DROP TABLE IF EXISTS {table_name}")
    conn.exec_driver_sql(f"CREATE TABLE {table_name} (col1 datetime2, col2 nvarchar(50))")
    conn.exec_driver_sql(f"""\
        INSERT INTO {table_name} (col1, col2) VALUES
        ('2003-03-26 15:12:15', '490002_space'), 
        ('2003-03-27 16:13:14', '490002_space')
    """)

# example
df = pd.read_sql_query(
    # suffix '_space' is 6 characters in length
    f"SELECT col1, LEFT(col2, LEN(col2) - 6) AS col2 FROM {table_name}",
    engine,
)
print(df)
"""
                 col1    col2
0 2003-03-26 15:12:15  490002
1 2003-03-27 16:13:14  490002
"""
  • Related