Suppose we have two data frames df1
and df2
with the following schema:
A
|-- B: struct (nullable = true)
| |-- b1: string (nullable = true)
| |-- b2: string (nullable = true)
| |-- b3: string (nullable = true)
| |-- C: array (nullable = true)
| | |-- D: struct (containsNull = true)
| | | |-- d1: string (nullable = true)
| | | |-- d2: string (nullable = true)
Would df1.union(df2)
work for these nested data frames if you wanted to add a new record? Or would you have to flatten them first if you wanted to add a new record?
CodePudding user response:
This should work, here is a knowledge article by databricks https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html
and you won't need to flatten your struct fields.
PS: Please ensure your column are in same orders in both dataframe.