SQLITE - Update table with auto-incremented id by group-CodePudding

I have a table on a format as below:

CREATE TABLE my_data 
(
    category TEXT,
    projectName TEXT,
    fileName TEXT,
    fileLine TEXT,
    fdate DATETIME,
    -- other columns...
    groupid NUMBER
);

Currently, groupid is NULL in all rows.

I need to group by some columns only, and add an auto-incremented ID per group. These columns are projectName, fileName and fileLine.

This means that, all rows within a group (namely, with the same values on those three columns), should be updated with the same ID. The query should assign the auto-incremented IDs (meaning, there are no values now).

How can I achieve this?

I saw something similar here Auto-increment with Group BY but it doesn't work, because my database is SQLite, and apparently some reserved words don't work there.

I considered doing a nested query such as an UPDATE-SET-FROM, where there is a nested group by on the from, but I don't know how to handle the auto-incremented ID assignment per-group.

CodePudding user response：

You may use DENSE_RANK function as the following:

With CTE As
(
Select category, projectName, fileName, fileLine, fdate, groupid,
       DENSE_RANK() Over (Order By projectName, fileName, fileLine) As grp
From my_data
)
Update my_data Set groupid= CTE.grp
From CTE
Where CTE.projectName = my_data.projectName And
      CTE.fileName= my_data.fileName And
      CTE.fileLine = my_data.fileLine

See a demo from db<>fiddle.

I think that you don't need to store this value, since you can simply get it with a select statement:

Select category, projectName, fileName, fileLine, fdate, 
       DENSE_RANK() Over (Order By projectName, fileName, fileLine) As groupid
From my_data

CodePudding user response：

Don't do this.

Such a table violates database normalization, because your real groups (projectName fileName fileLine) can get out of sync with the group ID. Instead create a group table:

CREATE TABLE my_group 
(
  groupid      INTEGER PRIMARY KEY,
  projectName  TEXT,
  fileName     TEXT,
  fileLine     TEXT,
  /* other columns ? */
  UNIQUE(projectName, fileName, fileLine)
);

Then fill this table from the original one:

INSERT INTO my_group (projectName, fileName, fileLine)
  SELECT DISTINCT projectName, fileName, fileLine
  FROM my_data;

If there are more columns you need to copy, then there are two options:

Option #1: Continue using DISTINCT. That will fail, if these column values are not unique for a group, because of the unique constraint, and would thus point you to data inconsistencies in your original table.

INSERT INTO my_group (projectName, fileName, fileLine, other_column)
  SELECT DISTINCT projectName, fileName, fileLine, other_column
  FROM my_data;

Option #2: Tell the DBMS which values to pick (usually the minimum, maximum or sum found for a group):

INSERT INTO my_group (projectName, fileName, fileLine, other_column)
  SELECT projectName, fileName, fileLine, MIN(other_column)
  FROM my_data
  GROUP BY projectName, fileName, fileLine;

Now that you have done that, your tables are already related and you have assigned a group ID for each group. If you want to stay with these tables related by their business key (projectName fileName fileLine), create the foreign key on them:

ALTER TABLE my_data
ADD CONSTRAINT fk_group
  FOREIGN KEY (projectName, fileName, fileLine) 
  REFERENCES my_group (projectName, fileName, fileLine);

and you are done.

If you don't want to do that, but relate the tables by group ID instead, add that to your table:

ALTER TABLE my_data ADD COLUMN groupid INTEGER;

Then fill it:

UPDATE my_data
SET groupid =
(
  SELECT groupid
  FROM my_group
  WHERE my_group.projectName = my_data.projectName
    AND my_group.fileName = my_data.fileName
    AND my_group.fileLine = my_data.fileLine
);

Then delete the redundant columns:

ALTER TABLE my_data DROP COLUMN projectName INTEGER;
ALTER TABLE my_data DROP COLUMN fileName INTEGER;
ALTER TABLE my_data DROP COLUMN fileLine INTEGER;

And then add the foreign key:

ALTER TABLE my_data
ADD CONSTRAINT fk_group
  FOREIGN KEY (groupid) 
  REFERENCES my_group (groupid);