I am creating a website that does computations based on historical daily stock quotes. Right now I have 5 tables corresponding to different letters in the alphabet (organized by symbol).
This is an example of the create statements I am using to build the tables.
CREATE TABLE `EOD_QRSTU` (
`symbol` varchar(10) NOT NULL,
`open` decimal(10,5) DEFAULT NULL,
`close` decimal(10,5) DEFAULT NULL,
`high` decimal(10,5) DEFAULT NULL,
`low` decimal(10,5) DEFAULT NULL,
`volume` int(11) DEFAULT NULL,
`date` datetime NOT NULL,
PRIMARY KEY (`date`,`symbol`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The most common query is selecting the rows for a particular symbol between two dates. This table for example has about 500,000 rows right now and takes up to 8 seconds to return about 400 rows for a particular symbol within a two-year range. Why does it take so long? How do sites like Yahoo Finance load their charts so fast? I have been thinking about storing each stock's data in its own file in the file system instead, can anyone offer some insight into this approach? If it is fast enough this way I don't even need to limit the number of rows returned, I could simply return every line in the file. Would CSV be a suitable file format?
If it matters I am using PythonAnywhere to host the site, for now, and am using Python with Flask-SQLalchemy for the backend. I also plan on migrating to AWS soon, will that solve part of the issue?
CodePudding user response:
Reorder your primary key and put symbol
before date
.
Indexes work best when the fixed part of the index, the symbol
is before the range searched part, date
.
Examine this by looking at explain {query}
and seeing how many bytes of the index is used and the query plan.
Also don't make symbol
UTF8 if you only are using an ascii
character set.