Home > front end >  Is there a way to insert a one to many using Pandas and SQLAlchemy for PostgreSQL using xlsx?
Is there a way to insert a one to many using Pandas and SQLAlchemy for PostgreSQL using xlsx?

Time:07-25

I have two tables. The second has an FK back to the first. This is a one to many relationship.

Body:

body_id body add_date
1 alien butts drive me nuts 2022-01-01

Body_Tags:

body_tag_id tag_id body_id
1 4 1
2 5 1
3 8 1

I have an xlsx file that has these columns;

body add_date tag1 tag2 tag3

I want to use Pandas and SQLAlchemy to insert the first two columns into Body as one row. Then insert tag1, tag2, tag3 into the Tags table all as their own row - each having the id from the Body table.

I have SQLAlchemy data models that match my database and I can query using the models and my FK and relationships work too. I can use Pandas to pull the xlsx file into a dataframe. I just don't know how I would insert a one to many like this. to_sql() inserts an entire dataframe at once. So getting the corresponding IDs wouldn't work with to_sql as far as I know.

Any way to achieve my goal?

CodePudding user response:

According to docs and other people it's impossible.

It's not possible to do with a single .to_sql() call, but it certainly can be done.

Say we have a class

class Body(Base):
    __tablename__ = "body"
    body_id = sa.Column(sa.Integer, primary_key=True)
    body = sa.Column(sa.String(255))
    add_date = sa.Column(sa.Date)

The table "body" exists and already contains rows. We also have the DataFrame that we imported from Excel:

print(df_from_xlsx)
"""
    body    add_date  tag1  tag2  tag3
0  body1  2022-07-21    11    12    13
1  body2  2022-07-22    21    22    23
2  body3  2022-07-23    31    32    33
"""

First we need to insert the rows into the parent table ("body"), retrieve the generated PK ("body_id") values, and add them as a column to the DataFrame.

with engine.begin() as conn:
    body_tbl = Body.__table__
    body_param_values = [
        dict(body=row.body, add_date=row.add_date)
        for row in df_from_xlsx.itertuples(index=False)
    ]
    # insert parent rows and retrieve PK values 
    new_body_ids = conn.scalars(
        body_tbl.insert().returning(body_tbl.c.body_id), body_param_values
    ).all()
    # add parent PK values as new column in DataFrame
    df_from_xlsx["new_body_id"] = new_body_ids
    print(df_from_xlsx)
    """
        body    add_date  tag1  tag2  tag3  new_body_id
    0  body1  2022-07-21    11    12    13          101
    1  body2  2022-07-22    21    22    23          102
    2  body3  2022-07-23    31    32    33          103
    """

Now we can extract the columns we need to insert into the child table

tag_rows = []
for row in df_from_xlsx.itertuples(index=False):
    tag_rows.append(dict(body_id=row.new_body_id, tag_id=row.tag1))
    tag_rows.append(dict(body_id=row.new_body_id, tag_id=row.tag2))
    tag_rows.append(dict(body_id=row.new_body_id, tag_id=row.tag3))
df_for_tags = pd.DataFrame(tag_rows)
print(df_for_tags)
"""
   body_id  tag_id
0      101      11
1      101      12
2      101      13
3      102      21
4      102      22
5      102      23
6      103      31
7      103      32
8      103      33
"""

and we can use .to_sql() to upload them.

  • Related