Home > OS >  Deleting delta files data from s3 path file
Deleting delta files data from s3 path file

Time:01-13

I am writing "delta" format file in AWS s3. Due to some corrupt data I need to delete data , I am using enterprise databricks which can access AWS S3 path, which has delete permission.

While I am trying to delete using below script

val p="s3a://bucket/path1/table_name"

import io.delta.tables.*;
import org.apache.spark.sql.functions;

DeltaTable deltaTable = DeltaTable.forPath(spark, p);
deltaTable.delete("date > '2023-01-01'");   

But it is not deleting data in s3 path which is "date > '2023-01-01'". I waited for 1 hour but still I see data , I have run above script multiple times.

So what is wrong here ? how to fix it ?

CodePudding user response:

The DELETE operation only deletes the data from the delta table, it just dereferences it from the latest version. To delete the data physically from the storage you have to run a VACUUM command:

Check: https://docs.databricks.com/sql/language-manual/delta-vacuum.html

CodePudding user response:

If you want delete the data physically from s3 you can use dbutils.fs.rm("path")

If you want tp just delete the data run spark.sql("delete from table_name where cond") or use magic command %sql and run delete command.

Even you can try vacuum command, but the default retention period is 7 days, if you want to delete the data which is less than 7 days then set this configuration SET spark.databricks.delta.retentionDurationCheck.enabled = false; and the execute vacuum command

  • Related