Pyspark Array column to dataframe-CodePudding

i need to transform an array column in pyspark dataframe to a dataframe itself.

Input:

number	values	combination
a	[e, f, g]	`[[e, f],[e,g],[f,g]...]`
b	[e, f, g ,h]	`[[e, f],[e,g],[f,g],[f,h]...]`
c	[b, c]	`[[b, c]]`

i want to get in output only the column combination as:

value1	value2
e	f
e	g
f	g
e	f
e	g
f	g
f	h
b	c

i want the extract line by line in the same dataframe without loop functions.

CodePudding user response：

let's say input dataframe is df.

from pyspark.sql import functions as F
df = df.select(F.explode(df.combination).alias("values"))
df = df.select(df.values[0].alias('value1'), df.values[1].alias('value2'))