How to add (explode) a new column from a list to a Spark Dataframe?-CodePudding

Currently I have a dataframe like below, and I want to add a new column called product_id.

 --- 
| id|
 --- 
|  0|
|  1|
 ---

The values for product_id is derived from a List[String](), an example of this List can be: sampleList = List(A, B, C)

For each id in the dataframe, I want to add all product_id:

 --- ---------- 
| id|product_id|
 --- ---------- 
|  0|         A|
|  0|         B|
|  0|         C|
|  1|         A|
|  1|         B|
|  1|         C|
 --- ----------

Is there a way to do this?

CodePudding user response：

You can use the crossJoin method.

val ls1 = List(0,1)
val df1 = ls1.toDF("id")
val sampleList = List("A", "B", "C")
val df2 = sampleList.toDF("product_id")
val df = df1.crossJoin(df2)
df.show()

CodePudding user response：

Generation of a sample dataframe & list

val sampleList = List("A", "B", "C")

val df = spark.range(2)

df.show()

 --- 
| id|
 --- 
|  0|
|  1|
 ---

Solution

import org.apache.spark.sql.functions.{explode,array,lit}

val explode_df = df.withColumn("product_id",explode(array(sampleList map lit: _*)))

explode_df.show()

 --- ---------- 
| id|product_id|
 --- ---------- 
|  0|         A|
|  0|         B|
|  0|         C|
|  1|         A|
|  1|         B|
|  1|         C|
 --- ----------