Here is the dataframe I have for now, suppose there are totally 4 days{1,2,3,4}:
------------- ---------- ------
| key | Time | Value|
------------- ---------- ------
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 4 | 3 |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
------------- ---------- ------
And what I want is
------------- ---------- ------
| key | Time | Value|
------------- ---------- ------
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | null |
| 1 | 4 | 3 |
| 2 | 1 | null |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | null |
------------- ---------- ------
If there is some ways that can help me get this?
CodePudding user response:
Say df1
is our main table:
--- ---- -----
|key|Time|Value|
--- ---- -----
|1 |1 |1 |
|1 |2 |2 |
|1 |4 |3 |
|2 |2 |4 |
|2 |3 |5 |
--- ---- -----
We can use the following transformations:
val data = df1
// we first group by and aggregate the values to a sequence between 1 and 4 (your number)
.groupBy("key")
.agg(sequence(lit(1), lit(4)).as("Time"))
// we explode the sequence, thus creating all 'Time' per 'key'
.withColumn("Time", explode(col("Time")))
// finally, we join with our main table on 'key' and 'Time'
.join(df1, Seq("key", "Time"), "left")
To get this output:
--- ---- -----
|key|Time|Value|
--- ---- -----
|1 |1 |1 |
|1 |2 |2 |
|1 |3 |null |
|1 |4 |3 |
|2 |1 |null |
|2 |2 |4 |
|2 |3 |5 |
|2 |4 |null |
--- ---- -----
Which should be what you are looking for, good luck!