Home > OS >  Generating new column with list of other column values
Generating new column with list of other column values

Time:05-21

I have a Data Frame as below

 ----- --- ------ ----- 
| id |age|height| score
 ----- --- ------ ----- 
|1001|  5|    80| 12
|1002|  9|    95| 189
|1003| 10|    82| 345
 ----- --- ------ ----- 

and want to create new column which combines all other columns in a key value structure and few columns as it is something like below

 ----- ---------------------------------------------------------- ------ 
| id  |property                                                  | score
 ----- ---------------------------------------------------------- ------ 
|1001|  {'id': '1001', 'age': '5', 'height': '80', 'score': '12'} | 12
|1002|  {'id': '1002', 'age': '9', 'height': '95', 'score': '189'}|189
|1003| {'id': '1003', 'age': '10', 'height': '82', 'score':'345'}|345
 ---------------------------------------------------------------- -------- 

I tried with df.withColumn('property', map(lambda row: row.asDict(), df.collect())) but it is not producing results as I want. Anything wrong with my approach?

CodePudding user response:

You can do it using to_json and struct functions.

df = df.select(
    'id',
    F.to_json(F.struct('*')).alias('property'),
    'score'
)
df.show(truncate=False)
  • Related