I need to update delta table based on lookup file rows .
lookup file has two columns a_acc & b_acc, it will have mulitple rows. i need to update b_acc value in delta table based on a_acc column value in lookup in delta table .
Lookup table
------------
|a_acc|b_acc|
-----------
| 4636|1999|
| 1023|892 |
| 3333|1111|
Delta Table
------------
|a_acc|b_acc|
-----------
| 4636|0123|
| 1023|843 |
| 3333|3232|
output Delta table:
------------
|a_acc|b_acc|
-----------
| 4636|1999|
| 1023|892 |
| 3333|1111|
For single single row i can update the values in delta table .
deltaTable.update(condition = "a_acc = '4636'",set = {"b_acc": "1999"})
But how can i loop all the value in lookup table and update correspondingly ?
CodePudding user response:
You can use merge to update the values (b_acc) in delta table when matching key found in lookup table (a_acc) ,
Below is sample code for same
deltaTable.alias("dt").merge(
source = Lookuptable.alias("lt"),
condition = "dt.a_acc = lt.a_acc"
).whenMatchedUpdate(set =
{
"b_acc": "lt.b_acc"
}
).execute()
CodePudding user response:
hope this can help u.
d1 =[[4636, 1999],[1023,892],[3333,1111], [444, 123]]
lookup_table = spark.createDataFrame(d1, ['a_acc', 'b_acc'])
d2 =[[4636, 123],[1023, 843],[3333,3232], [567, 221]]
delta_table = spark.createDataFrame(d2, ['a_acc', 'b_acc'])
(
delta_table
.join(lookup_table.withColumnRenamed("b_acc", "lookup_b_acc"), ["a_acc"], "left")
.withColumn("b_acc", coalesce(col("lookup_b_acc"), col("b_acc")))
.sort("a_acc")
.show()
)
----- ----- ------------
|a_acc|b_acc|lookup_b_acc|
----- ----- ------------
| 567| 221| null|
| 1023| 892| 892|
| 3333| 1111| 1111|
| 4636| 1999| 1999|
----- ----- ------------