Home > Software design >  Pyspark Array column to dataframe
Pyspark Array column to dataframe

Time:07-12

i need to transform an array column in pyspark dataframe to a dataframe itself.

Input:

number values combination
a [e, f, g] [[e, f],[e,g],[f,g]...]
b [e, f, g ,h] [[e, f],[e,g],[f,g],[f,h]...]
c [b, c] [[b, c]]

i want to get in output only the column combination as:

value1 value2
e f
e g
f g
e f
e g
f g
f h
b c

i want the extract line by line in the same dataframe without loop functions.

CodePudding user response:

let's say input dataframe is df.

from pyspark.sql import functions as F
df = df.select(F.explode(df.combination).alias("values"))
df = df.select(df.values[0].alias('value1'), df.values[1].alias('value2'))
  • Related