I have this mysql table that has a column containing json with random keys/values. Using below query I can get the keys/values for all id's, but as you can see; it contains duplicate packages.
CREATE TABLE `my_table` (
`package` mediumtext NOT NULL,
`id` varchar(255) NOT NULL,
`time` timestamp NOT NULL DEFAULT current_timestamp(),
KEY `id` (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb3
INSERT INTO my_table (id, time, package) VALUES
('myhost', '2022-05-08 09:00:00', '{"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}'),
('myhost', '2022-05-09 09:00:00', '{"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}'),
('myhost', '2022-05-10 09:00:00', '{"acl": "3.4.5-6", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"}'),
('host123', '2022-05-10 09:00:00', '{"httpd": "2.4.6-97-el7.centos.5", "kpartx": "0.4.9-135.el7_9", "libcap": "2.22-11.el7"}');
select id, time, package from my_table;
--------- --------------------- ------------------------------------------------------------------------------------------
| id | time | package |
--------- --------------------- ------------------------------------------------------------------------------------------
| myhost | 2022-05-08 09:00:00 | {"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"} |
| myhost | 2022-05-09 09:00:00 | {"acl": "2.3.1-1", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"} |
| myhost | 2022-05-10 09:00:00 | {"acl": "3.4.5-6", "apparmor": "2.0.4-2ubuntu2", "at": "3.2.5-1ubuntu1"} |
| host123 | 2022-05-10 09:00:00 | {"httpd": "2.4.6-97-el7.centos.5", "kpartx": "0.4.9-135.el7_9", "libcap": "2.22-11.el7"} |
--------- --------------------- ------------------------------------------------------------------------------------------
SELECT id,time,pkg,Json_unquote(Json_extract(package, Concat('$.', pkg))) AS version
FROM my_table
CROSS JOIN json_table(Json_keys(package,'$'), '$[*]' columns (pkg text path '$')) j
ORDER BY pkg;
--------- --------------------- ---------- -----------------------
| id | time | pkg | version |
--------- --------------------- ---------- -----------------------
| myhost | 2022-05-08 09:00:00 | acl | 2.3.1-1 |
| myhost | 2022-05-09 09:00:00 | acl | 2.3.1-1 |
| myhost | 2022-05-10 09:00:00 | acl | 3.4.5-6 |
| myhost | 2022-05-08 09:00:00 | apparmor | 2.0.4-2ubuntu2 |
| myhost | 2022-05-09 09:00:00 | apparmor | 2.0.4-2ubuntu2 |
| myhost | 2022-05-10 09:00:00 | apparmor | 2.0.4-2ubuntu2 |
| myhost | 2022-05-08 09:00:00 | at | 3.2.5-1ubuntu1 |
| myhost | 2022-05-09 09:00:00 | at | 3.2.5-1ubuntu1 |
| myhost | 2022-05-10 09:00:00 | at | 3.2.5-1ubuntu1 |
| host123 | 2022-05-10 09:00:00 | httpd | 2.4.6-97-el7.centos.5 |
| host123 | 2022-05-10 09:00:00 | kpartx | 0.4.9-135.el7_9 |
| host123 | 2022-05-10 09:00:00 | libcap | 2.22-11.el7 |
--------- --------------------- ---------- -----------------------
How do I adjust my query so that it filters the duplicate packages? I only want to keep 1 pkg
version
row per id
, sorted by time
:
--------- --------------------- ---------- -----------------------
| id | time | pkg | version |
--------- --------------------- ---------- -----------------------
| myhost | 2022-05-10 09:00:00 | acl | 3.4.5-6 |
| myhost | 2022-05-10 09:00:00 | apparmor | 2.0.4-2ubuntu2 |
| myhost | 2022-05-10 09:00:00 | at | 3.2.5-1ubuntu1 |
| host123 | 2022-05-10 09:00:00 | httpd | 2.4.6-97-el7.centos.5 |
| host123 | 2022-05-10 09:00:00 | kpartx | 0.4.9-135.el7_9 |
| host123 | 2022-05-10 09:00:00 | libcap | 2.22-11.el7 |
--------- --------------------- ---------- -----------------------
CodePudding user response:
You can try to use ROW_NUMBER
window function with a subquery to get the leatest each id
and pkg
Query #1
SELECT id,time,pkg,version
FROM (
SELECT id,time,pkg,Json_unquote(Json_extract(package, Concat('$.', pkg))) AS version,
ROW_NUMBER() OVER(PARTITION BY id,pkg ORDER BY time DESC,Json_unquote(Json_extract(package, Concat('$.', pkg)))) rn
FROM my_table
CROSS JOIN json_table(Json_keys(package,'$'), '$[*]' columns (pkg text path '$')) j
) t1
WHERE rn = 1;
id | time | pkg | version |
---|---|---|---|
host123 | 2022-05-10 09:00:00 | httpd | 2.4.6-97-el7.centos.5 |
host123 | 2022-05-10 09:00:00 | kpartx | 0.4.9-135.el7_9 |
host123 | 2022-05-10 09:00:00 | libcap | 2.22-11.el7 |
myhost | 2022-05-10 09:00:00 | acl | 3.4.5-6 |
myhost | 2022-05-10 09:00:00 | apparmor | 2.0.4-2ubuntu2 |
myhost | 2022-05-10 09:00:00 | at | 3.2.5-1ubuntu1 |