I am trying to create a pretty basic Glue job.
I have two different AWS RDS Mariadb's, with two similar tables (field names are different).
I would like to transform the data from table A so it fits with table B schema (this seems pretty trivial and is working).
And then i would like to update all existing entries (on a specific key) and insert all non existing ones.
Ive used the basic transform job which works fine if table b is empty and the insertion works fine (AWS roles / permissions / ports are fine)
But i get an expected duplicate key error, because it is just trying to insert.
I am very unsure what the simplest solution is, and where i can read about it.
The key on which table B should be updated is central_requisition_id (which is the pk in table A, but not in table B)
schemaapplymapping= ApplyMapping.apply(
frame=some_frame,
mappings=[
("supplier_id", "int", "central_parent_supplier_id", "int"),
("description", "string", "description", "string"),
("id", "int", "central_requisition_id", "int"),
],
transformation_ctx="schemaapplymapping",)
CodePudding user response:
I'm not sure on your specific needs but you can get around the duplicate key error by setting the write mode to overwrite
df.write.format('jdbc').options(url = dest_jdbc_url,
user = username,
password = password,
dbtable = dest ).mode("overwrite").save()