I have the following tables, and I wanted to see if I could perform an insert, where for each inserted row, the value of the linked column gets selected by the other one using SQL alchemy and the insert function. The tables are:
from sqlalchemy import MetaData, Table, Column, Integer, VARCHAR, ForeignKey, create_engine
SQL_ENGINE = create_engine(...)
metadata_obj = MetaData(bind=SQL_ENGINE, schema="collections")
flows = Table(
"flows", metadata_obj,
Column("id", Integer, autoincrement=True, primary_key=True),
Column("voyage_name", VARCHAR(100), nullable=False, unique=True),
Column("vessel_name", VARCHAR(length=100), nullable=True, unique=False),
Column("origin_country_id", Integer, ForeignKey("metadata.tbl_country.id"), nullable=True,
unique=False),
Column("destination_country_id", Integer, ForeignKey("metadata.tbl_country.id"),
nullable=True, unique=False))
country = Table(
"country", metadata_obj,
Column("id", Integer, autoincrement=True, primary_key=True),
Column("country_name", VARCHAR(100), nullable=False, unique=False),
Column("country_iso", VARCHAR(2), nullable=False, unique=False)
)
When I extract the values, I get the following data as a pandas dataframe:
voyage_name | vessel_name | origin_country_iso | destination_country_iso |
---|---|---|---|
voyagename1 | vesselname1 | US | GB |
voyagename2 | vesselname2 | BR | FR |
voyagename3 | vesselname3 | NL | CY |
Which I want to insert into my flows table. I then convert the above dataframe to a list of dictionaries that looks like this:
insert_data = [
{
"voyage_name": "voyagename1",
"vessel_name": "vesselname1",
"origin_country_iso": "US",
"destination_country_iso": "GB"
},
{
"voyage_name": "voyagename2",
"vessel_name": "vesselname2",
"origin_country_iso": "BR",
"destination_country_iso": "FR"
},
{
"voyage_name": "voyagename3",
"vessel_name": "vesselname3",
"origin_country_iso": "NL",
"destination_country_iso": "CY"
}
]
If written in SQL I could do the following to upload the data into my flows table:
INSERT INTO flows (
voyage_name,
vessel_name,
origin_country_id,
destination_country_id
)
VALUES
('voyagename1', 'vesselname1',
(SELECT id FROM country WHERE country_iso = 'US'), (SELECT id FROM country WHERE country_iso = 'GB')
),
('voyagename2', 'vesselname2', (SELECT id FROM country WHERE country_iso = 'BR'),
(SELECT id FROM country WHERE country_iso = 'FR')
),
('voyagename1', 'vesselname3',
(SELECT id FROM country WHERE country_iso = 'NL'), (SELECT id FROM country WHERE country_iso = 'CY')
)
How would I do the equivalent insert statement using SQLalchemy and the insert statement, with values fed by my dictionary?
from sqlalchemy.dialects.postgresql import insert
flows_insert_stmt = insert(flows).values(insert_data)
NOTE: Database is PostgreSQL.
CodePudding user response:
Assuming that you already have all iso values in that country table and no misses, one way to construct such a query statement is like this
from sqlalchemy import insert, select
for row in insert_data:
stmt = insert(flows).values(
voyage_name=row['voyage_name'],
vessel_name=row['vessel_name'],
origin_country_id=select(country.c.id).where(country.c.country_iso==row['origin_country_iso']).scalar_subquery(),
destination_country_id=select(country.c.id).where(country.c.country_iso==row['destination_country_iso']).scalar_subquery(),
)
print(stmt)
A statement produced by this script is:
INSERT INTO collections.flows (voyage_name, vessel_name, origin_country_id, destination_country_id) VALUES (%(voyage_name)s, %(vessel_name)s, (SELECT collections.country.id
FROM collections.country
WHERE collections.country.country_iso = %(country_iso_1)s), (SELECT collections.country.id
FROM collections.country
WHERE collections.country.country_iso = %(country_iso_2)s)) RETURNING collections.flows.id
The use of scalar_subquery is to force one value result