Union for Nested Spark Data Frames-CodePudding

Suppose we have two data frames df1 and df2 with the following schema:

A
 |-- B: struct (nullable = true)
 |    |-- b1: string (nullable = true)
 |    |-- b2: string (nullable = true)
 |    |-- b3: string (nullable = true)
 |    |-- C: array (nullable = true)
 |    |    |-- D: struct (containsNull = true)
 |    |    |    |-- d1: string (nullable = true)
 |    |    |    |-- d2: string (nullable = true)

Would df1.union(df2)work for these nested data frames if you wanted to add a new record? Or would you have to flatten them first if you wanted to add a new record?

CodePudding user response：

This should work, here is a knowledge article by databricks https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html

and you won't need to flatten your struct fields.

PS: Please ensure your column are in same orders in both dataframe.