Home > Blockchain >  Multiple array except in PySpark
Multiple array except in PySpark

Time:06-10

I have a Spark dataframe:

> numbers_df
 ---- ----------- ----------- ----------- ------------------------------------- 
| id |      num_1|      num_2|      num_3|                              all_num|
 ---- ----------- ----------- ----------- ------------------------------------- 
|   1|  [1, 2, 5]|     [4, 7]|     [8, 3]|          [1, 2, 3, 4, 5, 6, 7, 8, 9]|
|   2|   [12, 13]|   [10, 20]|   [15, 17]| [10, 11, 12, 13, 14, 15, 16, 17, 18]|
 ---- ----------- ----------- ----------- ------------------------------------- 

I need to except from column all_num values of num_1, num_2 and num_3 columns.
Expected result:

id num_1 num_2 num_3 all_num except_num
1 [1, 2, 5] [4, 7] [8, 3] [1, 2, 3, 4, 5, 6, 7, 8, 9] [6, 9]
2 [12, 13] [10, 16] [15, 17] [10, 11, 12, 13, 14, 15, 16, 17, 18] [11, 14, 18]

How can this be done in PySpark? Since array_except function takes only two columns as input

CodePudding user response:

You can combine array_except and concat functions.

df = df.withColumn('except_num', F.array_except('all_num', F.concat('num_1', 'num_2', 'num_3')))
df.show(truncate=False)
  • Related