I have a table like this one:
| machine | type | value | id |
----------- -------- ----------- ------
| 1 | a | 0.34 | 1 |
| 1 | b | 0.23 | 2 |
| 1 | b | 0.26 | 3 |
| 1 | b | 0.35 | 4 |
| 1 | a | 0.15 | 5 |
| 1 | b | 0.45 | 6 |
| 1 | b | 0.55 | 7 |
And I would like to obtain a table with the row with type "a" as column (id is not relevant in this case). For example:
| machine | value | value_col |
----------- -------- -----------
| 1 | 0.23 | 0.34 |
| 1 | 0.26 | 0.34 |
| 1 | 0.35 | 0.34 |
| 1 | 0.45 | 0.15 |
| 1 | 0.55 | 0.15 |
I tried the SQL Server pivot function but I cannot obtain the desired result.
Any help?
THANKS!
CodePudding user response:
use CROSS APPLY()
to obtain the last a
value
select t.machine, t.value, v.value_col
from tbl t
cross apply
(
select top 1 value_col = value
from tbl x
where x.id < t.id
and x.type = 'a'
order by id desc
) v
where t.type = 'b'
CodePudding user response:
It can be done in three steps, which makes me feel there are cleaner ways (but that I can't think of them)
- create an identifier to relate all rows which should share an
a
value - use a window function to copy that
a
value to the rest of the partition - filter to include only the
b
rows
For example...
WITH
a_partitioned AS
(
SELECT
*,
SUM(CASE WHEN type='a' THEN 1 ELSE 0 END) OVER (PARTITION BY machine ORDER BY id) AS a_partition
FROM
your_table
),
a_value_spread AS
(
SELECT
*,
MAX(CASE WHEN type = 'a' THEN value END) OVER (PARTITION BY machine, a_partition) AS a_value
FROM
a_partitioned
)
SELECT
*
FROM
a_value_spread
WHERE
type = 'b'
Demo : https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=9037c0d3e6d46e05ecbcfdd3777808cd
EDIT
Here's my REALLY superficial attempt to experiment with which answer might have the lowest processing overheads...
In the correlated sub-query version, the scan of the data is 4% of the total cost
- So, the whole query 'costs' ~20 index scans
In the analytic function version, the scan of the data is 12% of the total cost
- So, the whole query 'costs' ~8.3 index scans
It's a very small data set though, and a very arbitrarily constructed data-set.
- I'd run an actual profiler against your actual data
- IF performance matters
(If the correlated sub-query approach is 'fast enough', definitely use that.)