AWS Glue Job upsert from one db table to annother db table-CodePudding

I am trying to create a pretty basic Glue job.

I have two different AWS RDS Mariadb's, with two similar tables (field names are different).

I would like to transform the data from table A so it fits with table B schema (this seems pretty trivial and is working).

And then i would like to update all existing entries (on a specific key) and insert all non existing ones.

Ive used the basic transform job which works fine if table b is empty and the insertion works fine (AWS roles / permissions / ports are fine)

But i get an expected duplicate key error, because it is just trying to insert.

I am very unsure what the simplest solution is, and where i can read about it.

The key on which table B should be updated is central_requisition_id (which is the pk in table A, but not in table B)

schemaapplymapping= ApplyMapping.apply(
frame=some_frame,
mappings=[
    ("supplier_id", "int", "central_parent_supplier_id", "int"),
    ("description", "string", "description", "string"),
    ("id", "int", "central_requisition_id", "int"),
],
transformation_ctx="schemaapplymapping",)

CodePudding user response：

I'm not sure on your specific needs but you can get around the duplicate key error by setting the write mode to overwrite

df.write.format('jdbc').options(url = dest_jdbc_url, 
                                      user = username,
                                      password = password,
                                      dbtable = dest ).mode("overwrite").save()