Home > OS >  How to add array of struct to struct of array of struct in Spark scala
How to add array of struct to struct of array of struct in Spark scala

Time:03-12

I have below example

val df_temp1 = Seq(
  ("1","Adam","Angra", "Anastasia")
).toDF("id","fname", "mname", "lname")
df_temp1.createOrReplaceTempView("df_temp1")

val df1 = spark.sql("""select id,named_struct('opi1',array(named_struct('data_description','fname','data_details',fname),named_struct('data_description','mname','data_details',mname),named_struct('data_description','lname','data_details',lname))) as pi, array(named_struct('data_description','fname','data_details',fname),named_struct('data_description','mname','data_details',mname), named_struct('data_description','lname','data_details',lname)) as opi2 from df_temp1""")
df1.printSchema
df1.show(false)
df1.createOrReplaceTempView("df1")

That gives below output schema

root
 |-- id: string (nullable = true)
 |-- pi: struct (nullable = false)
 |    |-- opi1: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
 |-- opi2: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- data_description: string (nullable = false)
 |    |    |-- data_details: string (nullable = true)

And below result

 --- ----------------------------------------------------- --------------------------------------------------- 
|id |pi                                                   |opi2                                               |
 --- ----------------------------------------------------- --------------------------------------------------- 
|1  |{[{fname, Adam}, {mname, Angra}, {lname, Anastasia}]}|[{fname, Adam}, {mname, Angra}, {lname, Anastasia}]|
 --- ----------------------------------------------------- --------------------------------------------------- 

I want opi2 to be included along with opi1 in pi, So expected schema should look like this

root
 |-- id: string (nullable = true)
 |-- pi: struct (nullable = false)
 |    |-- opi1: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
 |----|-- opi2: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |--   |--data_description: string (nullable = false)
 |    |    |---- |--data_details: string (nullable = true)

And Expected Output will be two arrays opi1 and opi2 inside pi like below

 --- ----------------------------------------------------- --------------------------------------------------- 
|id |pi                                                                                                |
 --- ----------------------------------------------------- --------------------------------------------------- 
|1  |{[{fname, Adam}, {mname, Angra}, {lname, Anastasia}],[{fname, Adam}, {mname, Angra}, {lname, Anastasia}]}|
 --- ----------------------------------------------------- --------------------------------------------------- 

So basically adding existing column to struct (I am using Spark 2.3 by the way so any functions from Spark 2.4 cannot be used)

CodePudding user response:

Just create a new struct from pi.opi1 and opi2

val df2 = spark.sql("select id, named_struct('opi1',pi.opi1, 'opi2', opi2) as pi from df1")

df2.show(false)
df2.printSchema

 --- ---------------------------------------------------------------------------------------------------------- 
|id |pi                                                                                                        |
 --- ---------------------------------------------------------------------------------------------------------- 
|1  |{[{fname, Adam}, {mname, Angra}, {lname, Anastasia}], [{fname, Adam}, {mname, Angra}, {lname, Anastasia}]}|
 --- ---------------------------------------------------------------------------------------------------------- 

root
 |-- id: string (nullable = true)
 |-- pi: struct (nullable = false)
 |    |-- opi1: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
 |    |-- opi2: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
  • Related