Home > Software design >  How to add some values in a dataframe in Scala Spark?
How to add some values in a dataframe in Scala Spark?

Time:10-10

Here is the dataframe I have for now, suppose there are totally 4 days{1,2,3,4}:

 ------------- ---------- ------ 
| key         | Time     | Value|
 ------------- ---------- ------ 
|  1          |        1 | 1    | 
|  1          |        2 | 2    |
|  1          |        4 | 3    |
|  2          |        2 | 4    |
|  2          |        3 | 5    | 
 ------------- ---------- ------ 

And what I want is

 ------------- ---------- ------ 
| key         | Time     | Value|  
 ------------- ---------- ------ 
|  1          |        1 | 1    | 
|  1          |        2 | 2    | 
|  1          |        3 | null | 
|  1          |        4 | 3    | 
|  2          |        1 | null |
|  2          |        2 | 4    |
|  2          |        3 | 5    | 
|  2          |        4 | null | 
 ------------- ---------- ------ 

If there is some ways that can help me get this?

CodePudding user response:

Say df1 is our main table:

 --- ---- ----- 
|key|Time|Value|
 --- ---- ----- 
|1  |1   |1    |
|1  |2   |2    |
|1  |4   |3    |
|2  |2   |4    |
|2  |3   |5    |
 --- ---- ----- 

We can use the following transformations:

val data = df1
  // we first group by and aggregate the values to a sequence between 1 and 4 (your number)
  .groupBy("key")
  .agg(sequence(lit(1), lit(4)).as("Time"))
  // we explode the sequence, thus creating all 'Time' per 'key'
  .withColumn("Time", explode(col("Time")))
  // finally, we join with our main table on 'key' and 'Time'
  .join(df1, Seq("key", "Time"), "left")

To get this output:

 --- ---- ----- 
|key|Time|Value|
 --- ---- ----- 
|1  |1   |1    |
|1  |2   |2    |
|1  |3   |null |
|1  |4   |3    |
|2  |1   |null |
|2  |2   |4    |
|2  |3   |5    |
|2  |4   |null |
 --- ---- ----- 

Which should be what you are looking for, good luck!

  • Related