How to cross join unnest in PySpark?-CodePudding

I have a table:

Department	Name	Start	End
Finance	John Doe	01/01/2022	01/05/2002
Marketing	Mark Smith	05/02/2022	08/03/2002

I want to transform the table with what a cross join unnest would do in SQL.

So the output I'm looking for is:

Department	Name	Event	Date
Finance	John Doe	Start	01/01/2022
Finance	John Doe	End	01/05/2002
Marketing	Mark Smith	Start	05/02/2022
Marketing	Mark Smith	End	08/03/2002

How can I achieve this with a PySpark dataframe?

CodePudding user response：

Using the example you provided, stack can do it:

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('Finance', 'John Doe', '01/01/2022', '01/05/2002'),
     ('Marketing', 'Mark Smith', '05/02/2022', '08/03/2002')],
    ['Department', 'Name', 'Start', 'End'])

df = df.select('Department', 'Name', F.expr("stack(2, 'Start', Start, 'End', End) as (Event, Date)"))

df.show()
#  ---------- ---------- ----- ---------- 
# |Department|      Name|Event|      Date|
#  ---------- ---------- ----- ---------- 
# |   Finance|  John Doe|Start|01/01/2022|
# |   Finance|  John Doe|  End|01/05/2002|
# | Marketing|Mark Smith|Start|05/02/2022|
# | Marketing|Mark Smith|  End|08/03/2002|
#  ---------- ---------- ----- ----------