Currently I have a dataframe like below, and I want to add a new column called product_id
.
---
| id|
---
| 0|
| 1|
---
The values for product_id
is derived from a List[String]()
, an example of this List can be:
sampleList = List(A, B, C)
For each id
in the dataframe, I want to add all product_id
:
--- ----------
| id|product_id|
--- ----------
| 0| A|
| 0| B|
| 0| C|
| 1| A|
| 1| B|
| 1| C|
--- ----------
Is there a way to do this?
CodePudding user response:
You can use the crossJoin
method.
val ls1 = List(0,1)
val df1 = ls1.toDF("id")
val sampleList = List("A", "B", "C")
val df2 = sampleList.toDF("product_id")
val df = df1.crossJoin(df2)
df.show()
CodePudding user response:
Generation of a sample dataframe & list
val sampleList = List("A", "B", "C")
val df = spark.range(2)
df.show()
---
| id|
---
| 0|
| 1|
---
Solution
import org.apache.spark.sql.functions.{explode,array,lit}
val explode_df = df.withColumn("product_id",explode(array(sampleList map lit: _*)))
explode_df.show()
--- ----------
| id|product_id|
--- ----------
| 0| A|
| 0| B|
| 0| C|
| 1| A|
| 1| B|
| 1| C|
--- ----------