Home > front end >  Databrick SQL - Insert data to a partitioned table
Databrick SQL - Insert data to a partitioned table

Time:11-13

I have a question about the best practice to insert incremental data to a partitioned table. Currently I have a historical table partitioned by day, I have another sql that produce the same table with two latest days' data. Here is a sample historical table and incremental table -

tbl1: historical

day sales
2021-01-01 10
2021-01-02 15
2021-01-03 12
2021-01-04 10

tbl2: Incremental table (as of 1/6/2021)

day sales
2021-01-04 25
2021-01-05 20

tbl1: Expected Output:

day sales
2021-01-01 10
2021-01-02 15
2021-01-03 12
2021-01-04 25
2021-01-05 20

I tried insert into and insert overwrite, but it seems the insert into will not replace the existing 1/4 data but will add additional line and create duplicates, while the insert overwrite will remove all data from 1/1 to 1/3.

Sample code I wrote: INSERT INTO tbl1 PARTITION (day) SELECT * FROM tbl2

CodePudding user response:

INSERT OVERWRITE rewrites a whole table or partition, but it works will many file formats. So to upsert using INSERT OVERWRITE you must first LEFT JOIN the new data with the existing data, and use that to replace the partition.

If you're using Delta you can MERGE instead.

CodePudding user response:

Not sure, what is your table format . but if your table is Parquet or delta format , handling this scenario is really simple . You can refer link

  • Related