So I have an adjacency list that forms a hierarchy simulating a versioned file structure. The problem is that the incoming file names are not currently unique and they need to be. To make things slightly more interesting the files may have different versions which should keep the name of the first version (note the versions all have the same NodeID).
Adjacency List
ParentID | NodeID | VersionNum | FileName |
---|---|---|---|
-1 | 1 | 1 | FirstFolder |
1 | 2 | 1 | SecondFolder |
1 | 3 | 1 | ThirdFolder |
1 | 4 | 1 | FirstDocument |
1 | 4 | 2 | FirstDocument |
1 | 5 | 1 | FirstDocument |
1 | 5 | 2 | FirstDocument |
2 | 6 | 1 | FirstDocument |
2 | 6 | 2 | FirstDocument |
2 | 7 | 1 | SecondDocument |
3 | 8 | 1 | SecondDocument |
3 | 9 | 1 | ThirdDocument |
3 | 9 | 2 | ThirdDocument |
3 | 10 | 1 | ThirdDocument |
3 | 11 | 1 | ThirdDocument |
Targeted Result
ParentID | NodeID | VersionNum | FileName |
---|---|---|---|
-1 | 1 | 1 | FirstFolder |
1 | 2 | 1 | SecondFolder |
1 | 3 | 1 | ThirdFolder |
1 | 4 | 1 | FirstDocument |
1 | 4 | 2 | FirstDocument |
1 | 5 | 1 | FirstDocument_1 |
1 | 5 | 2 | FirstDocument_1 |
2 | 6 | 1 | FirstDocument |
2 | 6 | 2 | FirstDocument |
2 | 7 | 1 | SecondDocument |
3 | 8 | 1 | SecondDocument |
3 | 9 | 1 | ThirdDocument |
3 | 9 | 2 | ThirdDocument |
3 | 10 | 1 | ThirdDocument_1 |
3 | 11 | 1 | ThirdDocument_2 |
*I should also note that the folder names are already guaranteed to be unique (they already exist, it is the documents that are incoming) and they only have 1 version.
CREATE TABLE #tmp_tree
(
ParentID INT,
NodeID INT,
VersionNum INT,
FileName VARCHAR(50),
);
INSERT INTO #tmp_tree (ParentID, NodeID, VersionNum, FileName)
VALUES (-1, 1, 1, 'FirstFolder' ),
(1, 2, 1, 'SecondFolder' ),
(1, 3, 1, 'ThirdFolder' ),
(1, 4, 1, 'FirstDocument' ),
(1, 4, 2, 'FirstDocument' ),
(1, 5, 1, 'FirstDocument' ),
(1, 5, 2, 'FirstDocument' ),
(2, 6, 1, 'FirstDocument' ),
(2, 6, 2, 'FirstDocument' ),
(2, 7, 1, 'SecondDocument' ),
(3, 8, 1, 'SecondDocument' ),
(3, 9, 1, 'ThirdDocument' ),
(3, 9, 2, 'ThirdDocument' ),
(3, 10, 1, 'ThirdDocument' )
(3, 11, 1, 'ThirdDocument' )
I really don't know how to approach this though resorting to a stored procedure. Adjacency list scream CTEs to me but that got me no where real fast. Group By loses the NodeID so while I can find the names of the documents that need to be renamed - I don't know how to use that to select the second occurrence of the name (ordered by NodeID).
-- I don't see how this helps... but this finds the names that need to change.
select ParentID, FileName,VersionNum, count(*) from #tmp_tree
GROUP BY ParentID, FileName, VersionNum
HAVING VersionNum = 1 and count(*) > 1
order by FileName
I know how to solve this procedural but not declaratively.
I don't know if this is closer or farther away from the solution:
select f2.*, Row_Number() over (order by f2.FileName) from
(select top 10 f.*, count(FileName) over (PARTITION by ParentID, FileName) as n from (select * from #tmp_tree where versionNum = 1) as f
order by f.ParentID, f.FileName) as f2
Where n > 1
CodePudding user response:
I would assume the last line (3, 11) in the targeted result is a mistake.
You can find the repeated names with a window function in a subquery and then join it during the update. In short, you can do:
update #tmp_tree
set #tmp_tree.filename = concat(#tmp_tree.filename, '_', x.rn)
from #tmp_tree
join (
select *,
row_number() over(partition by parentid, filename order by nodeid) as rn
from #tmp_tree
where versionnum = 1
) x on x.rn > 1 and x.nodeid = #tmp_tree.nodeid;
Result:
ParentID NodeID VersionNum FileName
--------- ------- ----------- ---------------
-1 1 1 FirstFolder
1 2 1 SecondFolder
1 3 1 ThirdFolder
1 4 1 FirstDocument
1 4 2 FirstDocument
1 5 1 FirstDocument_2
1 5 2 FirstDocument_2
2 6 1 FirstDocument
2 6 2 FirstDocument
2 7 1 SecondDocument
3 8 1 SecondDocument
3 9 1 ThirdDocument
3 9 2 ThirdDocument
3 10 1 ThirdDocument_2
See running example at db<>fiddle.