I have two tables. The second has an FK back to the first. This is a one to many relationship.
Body:
body_id | body | add_date |
---|---|---|
1 | alien butts drive me nuts | 2022-01-01 |
Body_Tags:
body_tag_id | tag_id | body_id |
---|---|---|
1 | 4 | 1 |
2 | 5 | 1 |
3 | 8 | 1 |
I have an xlsx file that has these columns;
body | add_date | tag1 | tag2 | tag3 |
---|
I want to use Pandas and SQLAlchemy to insert the first two columns into Body as one row. Then insert tag1, tag2, tag3 into the Tags table all as their own row - each having the id from the Body table.
I have SQLAlchemy data models that match my database and I can query using the models and my FK and relationships work too. I can use Pandas to pull the xlsx file into a dataframe. I just don't know how I would insert a one to many like this. to_sql() inserts an entire dataframe at once. So getting the corresponding IDs wouldn't work with to_sql as far as I know.
Any way to achieve my goal?
CodePudding user response:
According to docs and other people it's impossible.
It's not possible to do with a single .to_sql()
call, but it certainly can be done.
Say we have a class
class Body(Base):
__tablename__ = "body"
body_id = sa.Column(sa.Integer, primary_key=True)
body = sa.Column(sa.String(255))
add_date = sa.Column(sa.Date)
The table "body" exists and already contains rows. We also have the DataFrame that we imported from Excel:
print(df_from_xlsx)
"""
body add_date tag1 tag2 tag3
0 body1 2022-07-21 11 12 13
1 body2 2022-07-22 21 22 23
2 body3 2022-07-23 31 32 33
"""
First we need to insert the rows into the parent table ("body"), retrieve the generated PK ("body_id") values, and add them as a column to the DataFrame.
with engine.begin() as conn:
body_tbl = Body.__table__
body_param_values = [
dict(body=row.body, add_date=row.add_date)
for row in df_from_xlsx.itertuples(index=False)
]
# insert parent rows and retrieve PK values
new_body_ids = conn.scalars(
body_tbl.insert().returning(body_tbl.c.body_id), body_param_values
).all()
# add parent PK values as new column in DataFrame
df_from_xlsx["new_body_id"] = new_body_ids
print(df_from_xlsx)
"""
body add_date tag1 tag2 tag3 new_body_id
0 body1 2022-07-21 11 12 13 101
1 body2 2022-07-22 21 22 23 102
2 body3 2022-07-23 31 32 33 103
"""
Now we can extract the columns we need to insert into the child table
tag_rows = []
for row in df_from_xlsx.itertuples(index=False):
tag_rows.append(dict(body_id=row.new_body_id, tag_id=row.tag1))
tag_rows.append(dict(body_id=row.new_body_id, tag_id=row.tag2))
tag_rows.append(dict(body_id=row.new_body_id, tag_id=row.tag3))
df_for_tags = pd.DataFrame(tag_rows)
print(df_for_tags)
"""
body_id tag_id
0 101 11
1 101 12
2 101 13
3 102 21
4 102 22
5 102 23
6 103 31
7 103 32
8 103 33
"""
and we can use .to_sql()
to upload them.