I'm beginner in Spark Scala. I want to duplicate each row in my DataFrame based on an input value.
My Input Dataset is something liks this
------------ --------------------- ---------------------
|id |currency |value |
------------ --------------------- ---------------------
|1 |USD |10 |
|1 |EUR |20 |
|2 |USD |30 |
|2 |EUR |40 |
------------ --------------------- ---------------------
And my Input Value is a Sequence. for example Seq("JPY"). I want an output like this:
------------ --------------------- ---------------------
|id |currency |value |
------------ --------------------- ---------------------
|1 |USD |10 |
|1 |EUR |20 |
|2 |USD |30 |
|2 |EUR |40 |
|1 |JPY |10 |
|1 |JPY |20 |
|2 |JPY |30 |
|2 |JPY |40 |
------------ --------------------- ---------------------
Could someOne please guide me how to solve this.
CodePudding user response:
You can do it by creating another duplicate dataframe overrwite the currency column, then union the 2 dataframes:
import spark.implicits._
val df = Seq(
("1", "USD", "10"),
("1", "EUR", "20"),
("2", "USD", "30"),
("2", "EUR", "40")
).toDF("id", "currency", "value")
df.union(df.withColumn("currency", lit("JPY"))).show(false)
--- -------- -----
|id |currency|value|
--- -------- -----
|1 |USD |10 |
|1 |EUR |20 |
|2 |USD |30 |
|2 |EUR |40 |
|1 |JPY |10 |
|1 |JPY |20 |
|2 |JPY |30 |
|2 |JPY |40 |
--- -------- -----
CodePudding user response:
You can append seq and then use explode and union with original df
import org.apache.spark.sql.functions._
val inputData = Seq((1, "USD", 10), (1, "EUR", 20), (2, "USD", 30), (2, "EUR", 40))
val inputSeq = Seq("JPY")
val originalDf = inputData.toDF("id", "currency", "value")
val originalDfWithSequence = originalDf.withColumn("currencies_seq", typedLit(inputSeq))
val originalDfExploded = originalDfWithSequence.select(col("id"), explode(col("currencies_seq")).alias("currency"), col("value") )
originalDf.union(originalDfExploded).show()
Outpus is:
--- -------- -----
| id|currency|value|
--- -------- -----
| 1| USD| 10|
| 1| EUR| 20|
| 2| USD| 30|
| 2| EUR| 40|
| 1| JPY| 10|
| 1| JPY| 20|
| 2| JPY| 30|
| 2| JPY| 40|
--- -------- -----