Home > Back-end >  Mysql distinct json key values
Mysql distinct json key values

Time:05-10

I have this mysql table that has a column containing json with random keys/values. Using below query I can get the keys/values for all id's, but as you can see; it contains duplicate packages.

CREATE TABLE `my_table` (
  `package` mediumtext NOT NULL,
  `id` varchar(255) NOT NULL,
  `time` timestamp NOT NULL DEFAULT current_timestamp(),
  KEY `id` (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb3
INSERT INTO my_table (id, time, package) VALUES
    ('myhost', '2022-05-08 09:00:00', '{"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}'),
    ('myhost', '2022-05-09 09:00:00', '{"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}'),
    ('myhost', '2022-05-10 09:00:00', '{"acl": "3.4.5-6", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}'),
    ('host123', '2022-05-10 09:00:00', '{"httpd": "2.4.6-97-el7.centos.5", "kpartx": "0.4.9-135.el7_9", "libcap": "2.22-11.el7"}');
select id, time, package from my_table;
 --------- --------------------- ------------------------------------------------------------------------------------------ 
| id      | time                | package                                                                                  |
 --------- --------------------- ------------------------------------------------------------------------------------------ 
| myhost  | 2022-05-08 09:00:00 | {"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}                 |
| myhost  | 2022-05-09 09:00:00 | {"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}                 |
| myhost  | 2022-05-10 09:00:00 | {"acl": "3.4.5-6", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}                 |
| host123 | 2022-05-10 09:00:00 | {"httpd": "2.4.6-97-el7.centos.5", "kpartx": "0.4.9-135.el7_9", "libcap": "2.22-11.el7"} |
 --------- --------------------- ------------------------------------------------------------------------------------------ 
SELECT     id,time,pkg,Json_unquote(Json_extract(package, Concat('$.', pkg))) AS version
FROM       my_table
CROSS JOIN json_table(Json_keys(package,'$'), '$[*]' columns (pkg text path '$')) j
ORDER BY   pkg;
 --------- --------------------- ---------- ----------------------- 
| id      | time                | pkg      | version               |
 --------- --------------------- ---------- ----------------------- 
| myhost  | 2022-05-08 09:00:00 | acl      | 2.3.1-1               |
| myhost  | 2022-05-09 09:00:00 | acl      | 2.3.1-1               |
| myhost  | 2022-05-10 09:00:00 | acl      | 3.4.5-6               |
| myhost  | 2022-05-08 09:00:00 | apparmor | 2.0.4-2ubuntu2        |
| myhost  | 2022-05-09 09:00:00 | apparmor | 2.0.4-2ubuntu2        |
| myhost  | 2022-05-10 09:00:00 | apparmor | 2.0.4-2ubuntu2        |
| myhost  | 2022-05-08 09:00:00 | at       | 3.2.5-1ubuntu1        |
| myhost  | 2022-05-09 09:00:00 | at       | 3.2.5-1ubuntu1        |
| myhost  | 2022-05-10 09:00:00 | at       | 3.2.5-1ubuntu1        |
| host123 | 2022-05-10 09:00:00 | httpd    | 2.4.6-97-el7.centos.5 |
| host123 | 2022-05-10 09:00:00 | kpartx   | 0.4.9-135.el7_9       |
| host123 | 2022-05-10 09:00:00 | libcap   | 2.22-11.el7           |
 --------- --------------------- ---------- ----------------------- 

How do I adjust my query so that it filters the duplicate packages? I only want to keep 1 pkg version row per id, sorted by time:

 --------- --------------------- ---------- ----------------------- 
| id      | time                | pkg      | version               |
 --------- --------------------- ---------- ----------------------- 
| myhost  | 2022-05-10 09:00:00 | acl      | 3.4.5-6               |
| myhost  | 2022-05-10 09:00:00 | apparmor | 2.0.4-2ubuntu2        |
| myhost  | 2022-05-10 09:00:00 | at       | 3.2.5-1ubuntu1        |
| host123 | 2022-05-10 09:00:00 | httpd    | 2.4.6-97-el7.centos.5 |
| host123 | 2022-05-10 09:00:00 | kpartx   | 0.4.9-135.el7_9       |
| host123 | 2022-05-10 09:00:00 | libcap   | 2.22-11.el7           |
 --------- --------------------- ---------- ----------------------- 

CodePudding user response:

You can try to use ROW_NUMBER window function with a subquery to get the leatest each id and pkg

Query #1

SELECT id,time,pkg,version
FROM (
 SELECT     id,time,pkg,Json_unquote(Json_extract(package, Concat('$.', pkg))) AS version,
            ROW_NUMBER() OVER(PARTITION BY id,pkg ORDER BY time DESC,Json_unquote(Json_extract(package, Concat('$.', pkg)))) rn
 FROM       my_table
 CROSS JOIN json_table(Json_keys(package,'$'), '$[*]' columns (pkg text path '$')) j
) t1
WHERE rn = 1;
id time pkg version
host123 2022-05-10 09:00:00 httpd 2.4.6-97-el7.centos.5
host123 2022-05-10 09:00:00 kpartx 0.4.9-135.el7_9
host123 2022-05-10 09:00:00 libcap 2.22-11.el7
myhost 2022-05-10 09:00:00 acl 3.4.5-6
myhost 2022-05-10 09:00:00 apparmor 2.0.4-2ubuntu2
myhost 2022-05-10 09:00:00 at 3.2.5-1ubuntu1

View on DB Fiddle

  • Related