I have the following table
var1 | time | var2 |
---|---|---|
val1 | 1 | a |
val1 | 2 | a |
val1 | 3 | a |
val1 | 4 | a |
val2 | 5 | a |
val1 | 6 | a |
val3 | 1 | b |
val4 | 2 | b |
val4 | 3 | b |
val5 | 4 | b |
And I wish to remove all the rows with repeating values in var1
and var2
keeping changes in at least one of them. Sorry for the unclear explanation, instead, here is an example of how the desired table will be:
var1 | time | var2 |
---|---|---|
val1 | 1 | a |
val2 | 5 | a |
val1 | 6 | a |
val3 | 1 | b |
val4 | 2 | b |
val5 | 4 | b |
Lastly, is there a way to know how much space this table will take before executing the query?
EDIT: fixing variables' names in the tables
CodePudding user response:
This is a gaps and islands problem, you could try the difference between two row numbers to solve it as the following:
WITH define_groups AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY var2 ORDER BY time) -
ROW_NUMBER() OVER (PARTITION BY var1, var2 ORDER BY time) grp
FROM table_name
),
row_numbering AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY var2, grp ORDER BY time) rn
FROM define_groups
)
SELECT var1, time, var2
FROM row_numbering
WHERE rn = 1
ORDER BY var2, time
See demo.