I have a table on a format as below:
CREATE TABLE my_data
(
category TEXT,
projectName TEXT,
fileName TEXT,
fileLine TEXT,
fdate DATETIME,
-- other columns...
groupid NUMBER
);
Currently, groupid
is NULL in all rows.
I need to group by some columns only, and add an auto-incremented ID per group. These columns are projectName
, fileName
and fileLine
.
This means that, all rows within a group (namely, with the same values on those three columns), should be updated with the same ID. The query should assign the auto-incremented IDs (meaning, there are no values now).
How can I achieve this?
I saw something similar here Auto-increment with Group BY but it doesn't work, because my database is SQLite, and apparently some reserved words don't work there.
I considered doing a nested query such as an UPDATE-SET-FROM
, where there is a nested group by
on the from, but I don't know how to handle the auto-incremented ID assignment per-group.
CodePudding user response:
You may use DENSE_RANK
function as the following:
With CTE As
(
Select category, projectName, fileName, fileLine, fdate, groupid,
DENSE_RANK() Over (Order By projectName, fileName, fileLine) As grp
From my_data
)
Update my_data Set groupid= CTE.grp
From CTE
Where CTE.projectName = my_data.projectName And
CTE.fileName= my_data.fileName And
CTE.fileLine = my_data.fileLine
See a demo from db<>fiddle.
I think that you don't need to store this value, since you can simply get it with a select statement:
Select category, projectName, fileName, fileLine, fdate,
DENSE_RANK() Over (Order By projectName, fileName, fileLine) As groupid
From my_data
CodePudding user response:
Don't do this.
Such a table violates database normalization, because your real groups (projectName fileName fileLine) can get out of sync with the group ID. Instead create a group table:
CREATE TABLE my_group
(
groupid INTEGER PRIMARY KEY,
projectName TEXT,
fileName TEXT,
fileLine TEXT,
/* other columns ? */
UNIQUE(projectName, fileName, fileLine)
);
Then fill this table from the original one:
INSERT INTO my_group (projectName, fileName, fileLine)
SELECT DISTINCT projectName, fileName, fileLine
FROM my_data;
If there are more columns you need to copy, then there are two options:
Option #1: Continue using DISTINCT
. That will fail, if these column values are not unique for a group, because of the unique constraint, and would thus point you to data inconsistencies in your original table.
INSERT INTO my_group (projectName, fileName, fileLine, other_column)
SELECT DISTINCT projectName, fileName, fileLine, other_column
FROM my_data;
Option #2: Tell the DBMS which values to pick (usually the minimum, maximum or sum found for a group):
INSERT INTO my_group (projectName, fileName, fileLine, other_column)
SELECT projectName, fileName, fileLine, MIN(other_column)
FROM my_data
GROUP BY projectName, fileName, fileLine;
Now that you have done that, your tables are already related and you have assigned a group ID for each group. If you want to stay with these tables related by their business key (projectName fileName fileLine), create the foreign key on them:
ALTER TABLE my_data
ADD CONSTRAINT fk_group
FOREIGN KEY (projectName, fileName, fileLine)
REFERENCES my_group (projectName, fileName, fileLine);
and you are done.
If you don't want to do that, but relate the tables by group ID instead, add that to your table:
ALTER TABLE my_data ADD COLUMN groupid INTEGER;
Then fill it:
UPDATE my_data
SET groupid =
(
SELECT groupid
FROM my_group
WHERE my_group.projectName = my_data.projectName
AND my_group.fileName = my_data.fileName
AND my_group.fileLine = my_data.fileLine
);
Then delete the redundant columns:
ALTER TABLE my_data DROP COLUMN projectName INTEGER;
ALTER TABLE my_data DROP COLUMN fileName INTEGER;
ALTER TABLE my_data DROP COLUMN fileLine INTEGER;
And then add the foreign key:
ALTER TABLE my_data
ADD CONSTRAINT fk_group
FOREIGN KEY (groupid)
REFERENCES my_group (groupid);