I have a DF like this:
------------ -------------------------------------------------------------
|pk_attr_name|pk_struct |
------------ -------------------------------------------------------------
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
------------ -------------------------------------------------------------
I want to define a JSON string from pk_struct colum. Desired output:
pk_struct_str = '[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"},{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]'
I tried:
pk_df.select(F.to_json(F.struct("pk_struct")).alias("json")).show(truncate=False)
but did not give me the desired result
pk_df.printSchema()
root
|-- pk_attr_name: string (nullable = true)
|-- pk_struct: string (nullable = true)
CodePudding user response:
You can achieve this result using collect_list or collect_set function.But it can be used along with aggregate function. So created dummy column and grouped by that column values and in aggregation used collect_list function
df.show(2,False)
df1 = df.withColumn("dummy",lit("XXX"))
df2 = df1.groupBy("dummy").agg(collect_list(df1.pk_struct))
df2.show(2,False)
------------ -------------------------------------------------------------
|pk_attr_name|pk_struct |
------------ -------------------------------------------------------------
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
------------ -------------------------------------------------------------
----- -----------------------------------------------------------------------------------------------------------------------------
|dummy|collect_list(pk_struct) |
----- -----------------------------------------------------------------------------------------------------------------------------
|XXX |[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"}, {"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]|
----- -----------------------------------------------------------------------------------------------------------------------------