Spark is reading from cosmosDB, which contains records like:
{
"answers": [
{
"answer": "2005-01-01 00:00",
"answerDt": "2022-07-01CEST08:07",
...,
"id": {uuid}
}
and code that takes those answers and created DF where each row is new record from that array:
dataDF
.select(
col("id").as("recordId"),
explode($"answers").as("qa")
)
.select(
col("recordId"),
$"qa.questionText",
col("qa.question").as("q-id"),
$"qa.answerText",
$"qa.answerDt"
)
.withColumn("id", concat_ws("-", col("q-id"), col("recordId")))
.drop(col("q-id"))
at the end I save it to other collection. What I need is that I would like to add position number into those records. So each answer row would have also some int number, which will be unique per recordId. ie: from 1 to 20.
lp|| recordId| questionText| answerText| answerDt| id|
-------------------- ------------------- -------------------- -------------------- ------------------- -------------------
1 |951a508c-d970-4d2...|Please give me th...| 197...|2022-06-28CEST16:52|123abcde_VB_GEN_Q...|
2 |951a508c-d970-4d2...|What X should I N...| female|2022-06-28CEST16:52|123abcde_VB_GEN_Q...|
3 |951a508c-d970-4d2...|Please Share me t...| 72 kg|2022-06-28CEST16:53|123abcde_VB_GEN_Q...|
1 |12345678-0987-4d2...|Give me the smth ...| 10 kg|2022-06-28CEST16:53|123abcde_VB_GEN_Q...|
Is it possible ? thanks
CodePudding user response:
val w = Window.partitionBy("recordId").orderBy("your col")
val resDF = sourceDF.withColumn("row_num", row_number.over(w))