Home > OS >  How to add array of struct to struct of array of struct in Spark scala
How to add array of struct to struct of array of struct in Spark scala


I have below example

val df_temp1 = Seq(
  ("1","Adam","Angra", "Anastasia")
).toDF("id","fname", "mname", "lname")

val df1 = spark.sql("""select id,named_struct('opi1',array(named_struct('data_description','fname','data_details',fname),named_struct('data_description','mname','data_details',mname),named_struct('data_description','lname','data_details',lname))) as pi, array(named_struct('data_description','fname','data_details',fname),named_struct('data_description','mname','data_details',mname), named_struct('data_description','lname','data_details',lname)) as opi2 from df_temp1""")

That gives below output schema

 |-- id: string (nullable = true)
 |-- pi: struct (nullable = false)
 |    |-- opi1: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
 |-- opi2: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- data_description: string (nullable = false)
 |    |    |-- data_details: string (nullable = true)

And below result

 --- ----------------------------------------------------- --------------------------------------------------- 
|id |pi                                                   |opi2                                               |
 --- ----------------------------------------------------- --------------------------------------------------- 
|1  |{[{fname, Adam}, {mname, Angra}, {lname, Anastasia}]}|[{fname, Adam}, {mname, Angra}, {lname, Anastasia}]|
 --- ----------------------------------------------------- --------------------------------------------------- 

I want opi2 to be included along with opi1 in pi, So expected schema should look like this

 |-- id: string (nullable = true)
 |-- pi: struct (nullable = false)
 |    |-- opi1: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
 |----|-- opi2: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |--   |--data_description: string (nullable = false)
 |    |    |---- |--data_details: string (nullable = true)

And Expected Output will be two arrays opi1 and opi2 inside pi like below

 --- ----------------------------------------------------- --------------------------------------------------- 
|id |pi                                                                                                |
 --- ----------------------------------------------------- --------------------------------------------------- 
|1  |{[{fname, Adam}, {mname, Angra}, {lname, Anastasia}],[{fname, Adam}, {mname, Angra}, {lname, Anastasia}]}|
 --- ----------------------------------------------------- --------------------------------------------------- 

So basically adding existing column to struct (I am using Spark 2.3 by the way so any functions from Spark 2.4 cannot be used)

CodePudding user response:

Just create a new struct from pi.opi1 and opi2

val df2 = spark.sql("select id, named_struct('opi1',pi.opi1, 'opi2', opi2) as pi from df1")


 --- ---------------------------------------------------------------------------------------------------------- 
|id |pi                                                                                                        |
 --- ---------------------------------------------------------------------------------------------------------- 
|1  |{[{fname, Adam}, {mname, Angra}, {lname, Anastasia}], [{fname, Adam}, {mname, Angra}, {lname, Anastasia}]}|
 --- ---------------------------------------------------------------------------------------------------------- 

 |-- id: string (nullable = true)
 |-- pi: struct (nullable = false)
 |    |-- opi1: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
 |    |-- opi2: array (nullable = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- data_description: string (nullable = false)
 |    |    |    |-- data_details: string (nullable = true)
  • Related