I have a table with 109 million records (growing everyday) - about 20GB now. This is a table that colects data from many variables with a timestamp. It's a scada system. It's simple,
CREATE TABLE `INVIEW_LOGS` (
`id` bigint(20) NOT NULL,
`iws_conn_id` smallint(6) NOT NULL,
`prf_station_id` smallint(3) NOT NULL,
`iws_var_id` mediumint(9) NOT NULL,
`var_name` char(125) NOT NULL,
`dia_hora` datetime NOT NULL,
`valor` decimal(18,2) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `INVIEW_LOGS`
--
ALTER TABLE `INVIEW_LOGS`
ADD PRIMARY KEY (`id`),
ADD KEY `prf_station_id` (`prf_station_id`),
ADD KEY `var_name` (`var_name`),
ADD KEY `var_name_2` (`var_name`),
ADD KEY `var_name_3` (`var_name`),
ADD KEY `var_name_4` (`var_name`),
ADD KEY `var_name_5` (`var_name`);
I have to make graphics, and I have to retrieve the last 10000 records for 6 or 7 variables to plot the chart.
The problem is... even with indexes and optimized, some variables (that have more than 10000 records, I have variables with million records), the time to retrieve that variable is long.. about 28 seconds.
How can I tweak this?!
I changed it from INODDB to MyISAM - It's a second database that is to read the entire day and write just the daily records during a short period in the night.
EDITED - 2021/10/01
-- Estrutura da tabela `INVIEW_LOGS`
--
CREATE TABLE `INVIEW_LOGS` (
`id` bigint(20) NOT NULL,
`iws_conn_id` smallint(6) NOT NULL,
`prf_station_id` smallint(3) NOT NULL,
`iws_var_id` mediumint(9) NOT NULL,
`var_name` char(125) NOT NULL,
`dia_hora` datetime NOT NULL,
`valor` decimal(18,2) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `INVIEW_LOGS`
--
ALTER TABLE `INVIEW_LOGS`
ADD PRIMARY KEY (`id`),
ADD KEY `prf_station_id` (`prf_station_id`),
ADD KEY `var_name` (`var_name`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `INVIEW_LOGS`
--
ALTER TABLE `INVIEW_LOGS`
MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
I don't know why there where so many indexes on var_name I rebuilt that.
QUERIES
select dia_hora, valor FROM INVIEW_LOGS WHERE var_name='PRF.Posto.var_5.VALUE' ORDER BY id DESC LIMIT 10000 10000 rows in set (29.09 sec)
With and INDEX select dia_hora, valor FROM INVIEW_LOGS WHERE prf_station_id=12 AND var_name='PRF.Posto.var_5.VALUE' ORDER BY id DESC LIMIT 10000 10000 rows in set (21.13 sec)
CodePudding user response:
What is the average length of var_name
? If it is relatively short, then your attempt to make the table ROW_FORMAT=FIXED
is causing it to be 20GB when it could be perhaps only 8GB. This may lead to extra I/O, hence slower processing. Normalization may lead to even more savings.
DECIMAL(18,2)
takes 9 bytes; this is an awfully big number for a SCADA system. What does valor
represent? Would a 4-byte FLOAT
(with about 7 significant digits) suffice?
Toss the redundant indexes; they slow down INSERTs
.
You are reaching for "the last 10000 records", but there is no index on dia_hora
.
Does "for 6 or 7 variables" refer to different values for var_id
?
Since "covering" indexes work differently between MyISAM and InnoDB, I really need to see the SELECT
.
(From OP's Comment)
select dia_hora, valor
FROM INVIEW_LOGS
WHERE var_name='PRF.Posto.var_5.VALUE'
ORDER BY id DESC
LIMIT 10000;
-- 10000 rows in set (29.09 sec)
select dia_hora, valor
FROM INVIEW_LOGS
WHERE prf_station_id=12
AND var_name='PRF.Posto.var_5.VALUE'
ORDER BY id DESC
LIMIT 10000;
-- 10000 rows in set (21.13 sec)
More
In MyISAM, each row is currently taking about 160 bytes. Given that the average var_name
is only 15 characters, there is an average of 125-15-1 = 109 bytes of spaces in CHAR
. Those spaces would go away if you switched to VARCHAR
. That change would shrink the table size from 20GB to under 7GB. Switching to InnoDB (which is recommended for many reasons) would raise it back to about 20GB.
Moving to a lookup table would further shrink the main table, but that may or may not be worth the effort.
("var_id is the same as var_name" -- This is confusing.)
Those selects are likely to benefit from
INDEX(var_name, id)
INDEX(prf_station_id, var_name, id)
And get rid of these since they are now redundant:
KEY `prf_station_id` (`prf_station_id`),
KEY `var_name` (`var_name`);
(These index recommendations apply to both MyISAM and InnoDB.
Graphing
10K points is a lot to be graphing. Wouldn't the graph be just as good with only 1K points? What is the sampling frequency? If, say, you are sampling every 5 seconds, then having a "summary table" with the average values for each minute would get it closer to 1K points to plot. This would let both the SELECT
and the graphing package run much faster.
More on summary tables: http://mysql.rjweb.org/doc.php/summarytables
CodePudding user response:
("var_id is the same as var_name" -- This is confusing.) It's quite redundant. The old code from PostGres uses var_id, but the new code on MySQL uses var_name.
I confess that I didn't realize yet that I could insert more than 1 column in the same index. I will try your aproach about those indexes.
We have variables that have an high rate of records... sometimes each second... Customers wants to get an overview of monthly and yearly chart that is plotted on Javascript... so, I have to grab a few thousands of variables to the browser.