I tried everything but still can't group values that belong to SHIP_ID on the same row.
How can I have all values belonging to the same SHIP_ID on the same row?
For example this is my original table:
DISTANCE_CODE | SHIP_ID | SHIP_COUNTRY | Unique_key |
---|---|---|---|
1245_COD | 54 | ||
Tunisia | 58 | ||
1C254 | 60 | ||
1100_COD | 82 | ||
Oman | 85 | ||
E9663 | 91 | ||
J9668 | 93 | ||
R9664 | 96 | ||
S9669 | 98 | ||
T9662 | 101 | ||
79663 | 106 | ||
E1661 | 108 | ||
1245_CR | 110 | ||
Chile | 115 | ||
99Z254 | 116 | ||
55X33 | 121 |
Expected results:
DISTANCE_CODE | SHIP_ID | SHIP_COUNTRY |
---|---|---|
1C254 | 1245_COD | Tunisia |
E9663 | 1100_COD | Oman |
J9668 | 1100_COD | Oman |
R9664 | 1100_COD | Oman |
S9669 | 1100_COD | Oman |
T9662 | 1100_COD | Oman |
79663 | 1100_COD | Oman |
E1661 | 1100_COD | Oman |
99Z254 | 1245_CR | Chile |
55X33 | 1245_CR | Chile |
Thank you in advance!
CodePudding user response:
I really hope that this dataset is being fixed, however here is some code that produces the output you are looking for. I do not know if your examples had the same number of columns as your dataset, but in my example the table in question was called Table_1
I assumed that UNIQUE_KEY
was always increasing.
-- Start by obtaining all of the SHIP_IDs
WITH Ships AS (
SELECT *
FROM Table_1
WHERE SHIP_ID IS NOT NULL
-- Compute the unique key of the next ship id for this set.
), NextShips AS (
SELECT
Ships.UNIQUE_KEY as ShipKey,
MIN(NextShip.UNIQUE_KEY) as NextShipKey
FROM Ships
LEFT JOIN Ships as NextShip ON NextShip.UNIQUE_KEY > Ships.UNIQUE_KEY
GROUP BY Ships.UNIQUE_KEY
-- Due to the calculation above, the last ship id will always have NULL as the next ship id.
), NullFix AS (
SELECT
ShipKey,
CASE WHEN NextShipKey IS NULL THEN (SELECT TOP 1 UNIQUE_KEY FROM Table_1 ORDER BY UNIQUE_KEY DESC) ELSE NextShipKey END as NextShipKey
FROM NextShips
)
-- Join the table onto itself, once for each column, being very mindful of which columns come from where.
SELECT
DistanceCodes.DISTANCE_CODE
, ShipId.SHIP_ID
, ShipCountries.SHIP_COUNTRY
FROM NullFix
JOIN Table_1 AS ShipId ON NullFix.ShipKey = ShipId.UNIQUE_KEY
JOIN Table_1 as DistanceCodes ON DistanceCodes.UNIQUE_KEY >= NullFix.ShipKey AND DistanceCodes.UNIQUE_KEY <= NullFix.NextShipKey AND DistanceCodes.DISTANCE_CODE IS NOT NULL
JOIN Table_1 as ShipCountries ON ShipCountries.UNIQUE_KEY >= NullFix.ShipKey AND ShipCountries.UNIQUE_KEY <= NullFix.NextShipKey AND ShipCountries.SHIP_COUNTRY IS NOT NULL
CodePudding user response:
Here is a possible solution using a stored procedure.
create procedure sp_seq
as
create table #shipping (
[DISTANCE_CODE] [nvarchar](255) NULL,
[SHIP_ID] [nvarchar](255) NULL,
[SHIP_COUNTRY] [nvarchar](255) NULL,
)
declare cur cursor for
select [DISTANCE_CODE]
,[SHIP_ID]
,[SHIP_COUNTRY]
from [dbo].[shipping] order by [Unique_key]
declare @DISTANCE_CODE varchar(255), @SHIP_ID varchar(255), @SHIP_COUNTRY varchar(255)
declare @distance varchar(255), @id varchar(255), @country varchar(255)
open cur
fetch next from cur into @DISTANCE_CODE,@SHIP_ID,@SHIP_COUNTRY
while @@FETCH_STATUS =0
begin
if (@SHIP_ID is not null and @SHIP_ID != @id)
select @distance=null,@id=null,@country=null
if (@DISTANCE_CODE is not null) set @distance = @DISTANCE_CODE
if (@SHIP_ID is not null) set @id = @SHIP_ID
if (@SHIP_COUNTRY is not null) set @country = @SHIP_COUNTRY
if (@distance is not null and @id is not null and @country is not null)
insert into #shipping(DISTANCE_CODE,SHIP_ID,SHIP_COUNTRY)
select @distance,@id,@country
fetch next from cur into @DISTANCE_CODE,@SHIP_ID,@SHIP_COUNTRY
end
close cur
deallocate cur
select * from #shipping
go
Result:
exec sp_seq
/*
DISTANCE_CODE SHIP_ID SHIP_COUNTRY
1C254 1245_COD Tunisia
E9663 1100_COD Oman
J9668 1100_COD Oman
R9664 1100_COD Oman
S9669 1100_COD Oman
T9662 1100_COD Oman
79663 1100_COD Oman
E1661 1100_COD Oman
99Z254 1245_CR Chile
55X33 1245_CR Chile
*/
CodePudding user response:
Just thought I would share another solution.
WITH cte_set AS (
-- Generate GroupID that increments each time SHIP_ID is populated
SELECT s.DISTANCE_CODE, s.SHIP_ID, s.SHIP_COUNTRY
, GroupID = COUNT(s.SHIP_ID) OVER (ORDER BY s.Unique_Key ROWS UNBOUNDED PRECEDING)
FROM #ship s
), cte_ships AS (
-- Return table of SHIP_ID SHIP_COUNTRY to GroupID mappings
SELECT x.GroupID
, SHIP_ID = MAX(x.SHIP_ID), SHIP_COUNTRY = MAX(x.SHIP_COUNTRY)
FROM cte_set x
GROUP BY x.GroupID
), cte_dc AS (
-- Return table of DISTANCE_CODE to GroupID mappings
SELECT x.DISTANCE_CODE, x.GroupID
FROM cte_set x
WHERE x.DISTANCE_CODE IS NOT NULL
)
SELECT d.DISTANCE_CODE, s.SHIP_ID, s.SHIP_COUNTRY
FROM cte_dc d
JOIN cte_ships s ON s.GroupID = d.GroupID;
The concept here was that I first generated a unique "GroupID" based on whether the SHIP_ID
was populated. Each time SHIP_ID
was populated, the GroupID
incremented.
Then from that dataset, I generated two new sets...one set contains the SHIP_ID
, SHIP_COUNTRY
, GroupID
. The second set contains the DISTANCE_CODE
, GroupID
. And then I joined it all together.
I'd also like to note that it's fairly efficient as well, at least compared to other methods I tried. I ran this against a 2 million record table, and it took 19 seconds to output the results to a temp table.
Sample data
IF OBJECT_ID('tempdb..#ship','U') IS NOT NULL DROP TABLE #ship; --SELECT * FROM #ship
CREATE TABLE #ship (
DISTANCE_CODE varchar(6) NULL,
SHIP_ID varchar(8) NULL,
SHIP_COUNTRY varchar(20) NULL,
Unique_Key int NOT NULL PRIMARY KEY CLUSTERED,
)
INSERT INTO #ship (DISTANCE_CODE, SHIP_ID, SHIP_COUNTRY, Unique_Key)
VALUES (NULL,'1245_COD',NULL,54)
, (NULL,NULL,'Tunisia',58)
, ('1C254',NULL,NULL,60)
, (NULL,'1100_COD',NULL,82)
, (NULL,NULL,'Oman',85)
, ('E9663',NULL,NULL,91)
, ('J9668',NULL,NULL,93)
, ('R9664',NULL,NULL,96)
, ('S9669',NULL,NULL,98)
, ('T9662',NULL,NULL,101)
, ('79663',NULL,NULL,106)
, ('E1661',NULL,NULL,108)
, (NULL,'1245_CR',NULL,110)
, (NULL,NULL,'Chile',115)
, ('99Z254',NULL,NULL,116)
, ('55X33',NULL,NULL,121);