Home > other >  Converting a field to lower case and merging data in an sqlite database
Converting a field to lower case and merging data in an sqlite database

Time:12-22

I need to merge some randomly uppercased data that has been collected in an SQLite table key_val, such that key is always lowercase and no vals are lost. There is a unique compound index on key,val.

The initial data looks like this:

key|val
abc|1
abc|5
aBc|1
aBc|5
aBc|3
aBc|2
AbC|1
abC|3

The result after the merge would be

key|val
abc|1
abc|2
abc|3
abc|5

In my programmer brain, I would

for each `key` with upper case letters; 
  if a lower cased `key` is found with the same value
    then delete `key`
    else update `key` to lower case

Re implementing the loop has a sub query for each row found with upper case letters, to check if the val already exists as a lower case key

If it does, I can delete the cased key.

From there I can UPDATE key = lower(key) as the "duplicates" have been removed.

The first cut of the programming method of finding the dupes is:

SELECT * FROM key_val as parent 
WHERE parent.key != lower(parent.key) 
 AND 0 < (
           SELECT count(s.val) FROM key_val as s
           WHERE s.key = lower(parent.key) AND s.val = parent.val
         )
ORDER BY parent.key DESC;

I'm assuming there's a better way to do this in SQLite? The ON CONFLICT functionality seems to me like it should be able to handle the dupe deletion on UPDATE but I'm not seeing it.

CodePudding user response:

Honestly it might just be easier to create a new table and then insert into it. As it seems you really just want a distinct select here, use:

INSERT INTO kev_val_new ("key", val)
SELECT DISTINCT LOWER("key"), val
FROM key_val;

Once you have populated the new table, you may drop the old one, and then rename the new one to the previous name:

DROP TABLE key_val;
ALTER TABLE key_val_new RENAME TO key_val;

CodePudding user response:

I agree with @Tim that it would be easire to re-create table using simple select distict lower().. statement, but that's not always easy if table has dependant objects (indexes, triggers, views). In this case this can be done as sequence of two steps:

insert lowered keys which are not still there:

insert into t
select distinct lower(tr.key) as key, tr.val 
from t as tr
left join t as ts on ts.key = lower(tr.key) and ts.val = tr.val
where ts.key is null;

now when we have all lowered keys - remove other keys:

delete from t where key <> lower(key);

See fiddle: http://sqlfiddle.com/#!5/84db50/11 However this method assumes that key is always populated (otherwise it would be a strange key)

If vals can be null then "ts.val = tr.val" should be replaced with more complex stuff like ifnull(ts.val, -1) = ifnull(tr.val, -1) where -1 is some unused value (can be different). If we can't assume any unused value like -1 then it should be more complex check for null / not null cases.

CodePudding user response:

First delete all the duplicates:

DELETE FROM key_val AS k1
WHERE EXISTS (
  SELECT 1 
  FROM key_val AS k2 
  WHERE LOWER(k2.key) = LOWER(k1.key) AND k2.val = k1.val AND k2.rowid < k1.rowid 
);

by keeping only 1 combination of key and val with the min rowid.
It is not important if you kept the key with all lower chars or not, because the 2nd step is to update the table:

UPDATE key_val
SET key = LOWER(key);

See the demo.

  • Related