Home > database >  Merging arrays conditionally
Merging arrays conditionally

Time:06-09

I have dataframe like this:

column_1   column_2    column_3
   [1,3]        [2]           2
 [1,2,3]       null           1 
   [3,4]        [6]           1

I want to append the value of column_2 to column_1 if it's not null and del column_2.

Desired output:

column_1    column_3
 [1,3,2]           2
 [1,2,3]           1
 [3,4,6]           1

CodePudding user response:

  • if column_2 is null, return column_1
  • otherwise, union column_1 with column_2 using array_union
from pyspark.sql import functions as F
df = spark.createDataFrame(
    [([1,3], [2], 2),
     ([1,2,3], None, 1),
     ([3,4], [6], 1)],
    ['column_1', 'column_2', 'column_3']
)
df = df.select(
    F.when(F.col('column_2').isNull(), F.col('column_1')).otherwise(F.array_union('column_1', 'column_2')).alias('column_1'),
    'column_3'
)
df.show()
#  --------- -------- 
# | column_1|column_3|
#  --------- -------- 
# |[1, 3, 2]|       2|
# |[1, 2, 3]|       1|
# |[3, 4, 6]|       1|
#  --------- -------- 
  • Related