I have a table which has a large amount of variable data with the key
field in particular that has 20/30 variants. The structure is as below:
CREATE TABLE `device_data` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`device_id` bigint(20) unsigned NOT NULL,
`serialized` tinyint(1) NOT NULL DEFAULT 0,
`system` tinyint(1) NOT NULL DEFAULT 0,
`key` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL,
`value` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_device_data_key` (`device_id`,`key`),
CONSTRAINT `device_data_device_id_foreign` FOREIGN KEY (`device_id`) REFERENCES `devices` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The table holds about 80,000 records but the key
field has only 30/40 variants. The problem with this is that if the key value is key_name_here
and it's repeated 30,000 times, that's 13 bytes for every row * 30,000.
I understand that I could create an enum column but the system itself is very dynamic and subject to change. It's not faesable to create a new enum value for the key
every time.
So my question here, would it make sense to add an index to the key
field, in the hope this would reduce overall data consumption and benefit performance or would this simply be detremental to performance?
It's worth adding that in 99% of cases we'd already be filtering down based on the device_id
column, which makes me suspect this would just be entirely detrimental.
Thanks in advance, Chris.
CodePudding user response:
Storing the index costs some storage space, but that alone doesn't hurt performance much. It only takes about 1MB of storage. Unless you're targeting this database to be stored on a mobile device or IoT device, that's pretty insignificant storage-wise.
You already have an index that includes key
, the one created for the UNIQUE KEY constraint. This index would not help a query that searched for key
alone, but you said in 99% of cases, you search for device_id
as well. In that case, the compound index would help reduce the examined rows by both columns. So you don't need to index key
separately.
The problem might happen if you occasionally run a query searching for key
without a condition for device_id
. Then it would do a table-scan and search all 80,000 rows, evaluating them row-by-row against your search condition.
If you're willing to bear the performance drag in these cases, then you can avoid creating an additional index on key
.
But it's not only important how often you run that queries. It might be that when you run that query, even though it's infrequent, you need it to be optimized better than a table-scan.
Or it could be more important to not have an extra index not for the storage, but because insert/update/delete against this table would have to do three index writes instead of the current two index writes. If you need to optimize for rapid write queries over rapid read queries, that's up to your application needs.
This is something we can't answer for you, it's your judgment call.