I have a pyspark dataframe that looks like this,
data = [("James","Joyce"),
("Michael","Doglus"),
("Robert","Connings"),
("Maria","XYZ"),
("Jen","PQR")
]
df2 = spark.createDataFrame(data, ["Name", "Lots_of_names"])
df2
Name Lots_of_names
0 James Joyce
1 Michael Doglus
2 Robert Connings
3 Maria XYZ
4 Jen PQR
I want to merge the two columns into one long column (probably in a new dataframe), that will have 10 rows. Is there any way to get there? Thanks in advance.
CodePudding user response:
you are probably looking to do something like this
import pyspark.sql.functions as F
df_out = df2.select(F.explode(F.array("Name", "Lots_of_names")).alias("one_col"))
which produces df_out as follows
# one_col
#------
# James
# Joyce
# Michael
# Doglus
# Robert
# Connings
# Maria
# XYZ
# Jen
# PQR