Home > Mobile >  Scala Spark, array with incremental new column
Scala Spark, array with incremental new column

Time:08-02

Spark is reading from cosmosDB, which contains records like:

{
    "answers": [
        {
            "answer": "2005-01-01 00:00",
            "answerDt": "2022-07-01CEST08:07",
...,
"id": {uuid}

}

and code that takes those answers and created DF where each row is new record from that array:

dataDF
    .select(
      col("id").as("recordId"),
      explode($"answers").as("qa")
    )
    .select(
      col("recordId"),
      $"qa.questionText",
      col("qa.question").as("q-id"),
      $"qa.answerText",
      $"qa.answerDt"
    )
    .withColumn("id", concat_ws("-", col("q-id"), col("recordId")))
    .drop(col("q-id"))

at the end I save it to other collection. What I need is that I would like to add position number into those records. So each answer row would have also some int number, which will be unique per recordId. ie: from 1 to 20.

                  lp||           recordId|        questionText|          answerText|           answerDt|                 id|
--------------------  ------------------- -------------------- -------------------- ------------------- ------------------- 
1                   |951a508c-d970-4d2...|Please give me th...|              197...|2022-06-28CEST16:52|123abcde_VB_GEN_Q...|
2                   |951a508c-d970-4d2...|What X should I N...|              female|2022-06-28CEST16:52|123abcde_VB_GEN_Q...|
3                   |951a508c-d970-4d2...|Please Share me t...|               72 kg|2022-06-28CEST16:53|123abcde_VB_GEN_Q...|
1                   |12345678-0987-4d2...|Give me the smth ...|               10 kg|2022-06-28CEST16:53|123abcde_VB_GEN_Q...|

Is it possible ? thanks

CodePudding user response:

val w  = Window.partitionBy("recordId").orderBy("your col")
val resDF = sourceDF.withColumn("row_num", row_number.over(w))
  • Related