Home > OS >  how to add columns to a delta table using python able
how to add columns to a delta table using python able

Time:10-25

I have a delta table

# Load the data from its source.
df = spark.read.load("/databricks-datasets/learning-spark-v2/people/people-10m.delta")

# Write the data to a table.
table_name = "people_10m"
df.write.saveAsTable(table_name)

I now have a schema change that I want to add, maybe a single column, maybe a few columns, maybe nested arrays. I can't predict what will come up in the code execution.

I used python's set API to find the new columns, and now I want to add them to the delta table. Ideally, using python API.

One thought was to modify the schema of the Dataframe and then somehow tell the table to match. I'm using python's set API to find new columns. I don't want to read the whole dataset and write it, I don't want to kill the history also. I would be ok with schema evolution if it's possible to do it without any data (just schema update) and stop all column deletions.

CodePudding user response:

The solution that worked was create an empty df with the new schema ( no rem columns only add) them append to the table but write with schema evolution.

  • Related