How to merge columns into one on top of each other in pyspark?-CodePudding

I have a pyspark dataframe that looks like this,

data = [("James","Joyce"),
    ("Michael","Doglus"),
    ("Robert","Connings"),
    ("Maria","XYZ"),
    ("Jen","PQR")
  ]

df2 = spark.createDataFrame(data, ["Name", "Lots_of_names"])
df2


    Name    Lots_of_names
0   James   Joyce
1   Michael     Doglus
2   Robert  Connings
3   Maria   XYZ
4   Jen     PQR

I want to merge the two columns into one long column (probably in a new dataframe), that will have 10 rows. Is there any way to get there? Thanks in advance.

CodePudding user response：

you are probably looking to do something like this

import pyspark.sql.functions as F

df_out = df2.select(F.explode(F.array("Name", "Lots_of_names")).alias("one_col"))

which produces df_out as follows

# one_col
#------
# James
# Joyce
# Michael
# Doglus
# Robert
# Connings
# Maria
# XYZ
# Jen
# PQR