I have a Data Frame as below
----- --- ------ -----
| id |age|height| score
----- --- ------ -----
|1001| 5| 80| 12
|1002| 9| 95| 189
|1003| 10| 82| 345
----- --- ------ -----
and want to create new column which combines all other columns in a key value structure and few columns as it is something like below
----- ---------------------------------------------------------- ------
| id |property | score
----- ---------------------------------------------------------- ------
|1001| {'id': '1001', 'age': '5', 'height': '80', 'score': '12'} | 12
|1002| {'id': '1002', 'age': '9', 'height': '95', 'score': '189'}|189
|1003| {'id': '1003', 'age': '10', 'height': '82', 'score':'345'}|345
---------------------------------------------------------------- --------
I tried with df.withColumn('property', map(lambda row: row.asDict(), df.collect()))
but it is not producing results as I want. Anything wrong with my approach?
CodePudding user response:
You can do it using to_json
and struct
functions.
df = df.select(
'id',
F.to_json(F.struct('*')).alias('property'),
'score'
)
df.show(truncate=False)