Home > database >  How to merge columns into one on top of each other in pyspark?
How to merge columns into one on top of each other in pyspark?

Time:11-20

I have a pyspark dataframe that looks like this,

data = [("James","Joyce"),
    ("Michael","Doglus"),
    ("Robert","Connings"),
    ("Maria","XYZ"),
    ("Jen","PQR")
  ]

df2 = spark.createDataFrame(data, ["Name", "Lots_of_names"])
df2


    Name    Lots_of_names
0   James   Joyce
1   Michael     Doglus
2   Robert  Connings
3   Maria   XYZ
4   Jen     PQR

I want to merge the two columns into one long column (probably in a new dataframe), that will have 10 rows. Is there any way to get there? Thanks in advance.

CodePudding user response:

you are probably looking to do something like this

import pyspark.sql.functions as F

df_out = df2.select(F.explode(F.array("Name", "Lots_of_names")).alias("one_col"))

which produces df_out as follows

# one_col
#------
# James
# Joyce
# Michael
# Doglus
# Robert
# Connings
# Maria
# XYZ
# Jen
# PQR
  • Related