Home > Enterprise >  How to cross join unnest in PySpark?
How to cross join unnest in PySpark?

Time:06-21

I have a table:

Department Name Start End
Finance John Doe 01/01/2022 01/05/2002
Marketing Mark Smith 05/02/2022 08/03/2002

I want to transform the table with what a cross join unnest would do in SQL.

So the output I'm looking for is:

Department Name Event Date
Finance John Doe Start 01/01/2022
Finance John Doe End 01/05/2002
Marketing Mark Smith Start 05/02/2022
Marketing Mark Smith End 08/03/2002

How can I achieve this with a PySpark dataframe?

CodePudding user response:

Using the example you provided, stack can do it:

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('Finance', 'John Doe', '01/01/2022', '01/05/2002'),
     ('Marketing', 'Mark Smith', '05/02/2022', '08/03/2002')],
    ['Department', 'Name', 'Start', 'End'])

df = df.select('Department', 'Name', F.expr("stack(2, 'Start', Start, 'End', End) as (Event, Date)"))

df.show()
#  ---------- ---------- ----- ---------- 
# |Department|      Name|Event|      Date|
#  ---------- ---------- ----- ---------- 
# |   Finance|  John Doe|Start|01/01/2022|
# |   Finance|  John Doe|  End|01/05/2002|
# | Marketing|Mark Smith|Start|05/02/2022|
# | Marketing|Mark Smith|  End|08/03/2002|
#  ---------- ---------- ----- ---------- 
  • Related