I have a Spark DataFrame that looks like following:
-- ----------------------- ---------------------
|id| type| name|
-- ----------------------- ---------------------
| 1|[stars] |[sun, altair, sirius]|
| 2|[solar system, planets]|[mars, earth] |
| 3|[natural satellites] |[moon, io, titan] |
-- ----------------------- ---------------------
I want to add an extra column by concatenating the array from the type
column and the array from the name
column, but separating them with a semicolon (;)
delimiter.
Expected output:
-- ----------------------- --------------------- -----------------------------------
|id| type| name| result|
-- ----------------------- --------------------- -----------------------------------
| 1|[stars] |[sun, altair, sirius]|stars; sun, altair, sirius |
| 2|[solar system, planets]|[mars, venus] |solar system, planets; mars, venus |
| 3|[natural satellites] |[moon, io, titan] |natural satellites; moon, io, titan|
-- ----------------------- --------------------- -----------------------------------
I tried to apply concat_ws
function, but I got the result, which is different from what I expected.
So, is it possible to get the desired output using PySpark
?
CodePudding user response:
You need to call concat_ws
once to join array entries using comma, then once more to join two results with semicolon.
SELECT id, type, name,
concat_ws('; ', concat_ws(', ', type), concat_ws(', ', name)) as result
FROM ...