Home > Net >  Convert two arrays into string separated by a special delimiter
Convert two arrays into string separated by a special delimiter

Time:01-26

I have a Spark DataFrame that looks like following:

 -- ----------------------- --------------------- 
|id|                   type|                 name|
 -- ----------------------- --------------------- 
| 1|[stars]                |[sun, altair, sirius]|
| 2|[solar system, planets]|[mars, earth]        |
| 3|[natural satellites]   |[moon, io, titan]    |
 -- ----------------------- --------------------- 

I want to add an extra column by concatenating the array from the type column and the array from the name column, but separating them with a semicolon (;) delimiter.

Expected output:

 -- ----------------------- --------------------- ----------------------------------- 
|id|                   type|                 name|                             result|
 -- ----------------------- --------------------- ----------------------------------- 
| 1|[stars]                |[sun, altair, sirius]|stars; sun, altair, sirius         |
| 2|[solar system, planets]|[mars, venus]        |solar system, planets; mars, venus |
| 3|[natural satellites]   |[moon, io, titan]    |natural satellites; moon, io, titan|
 -- ----------------------- --------------------- ----------------------------------- 

I tried to apply concat_ws function, but I got the result, which is different from what I expected.

So, is it possible to get the desired output using PySpark?

CodePudding user response:

You need to call concat_ws once to join array entries using comma, then once more to join two results with semicolon.

SELECT id, type, name,
       concat_ws('; ', concat_ws(', ', type), concat_ws(', ', name)) as result
  FROM ...
  • Related