I have dataframe like this:
column_1 column_2 column_3
[1,3] [2] 2
[1,2,3] null 1
[3,4] [6] 1
I want to append the value of column_2 to column_1 if it's not null and del column_2.
Desired output:
column_1 column_3
[1,3,2] 2
[1,2,3] 1
[3,4,6] 1
CodePudding user response:
- if column_2 is null, return column_1
- otherwise, union column_1 with column_2 using
array_union
from pyspark.sql import functions as F
df = spark.createDataFrame(
[([1,3], [2], 2),
([1,2,3], None, 1),
([3,4], [6], 1)],
['column_1', 'column_2', 'column_3']
)
df = df.select(
F.when(F.col('column_2').isNull(), F.col('column_1')).otherwise(F.array_union('column_1', 'column_2')).alias('column_1'),
'column_3'
)
df.show()
# --------- --------
# | column_1|column_3|
# --------- --------
# |[1, 3, 2]| 2|
# |[1, 2, 3]| 1|
# |[3, 4, 6]| 1|
# --------- --------