I have the following query in an old database (MySql 5.7.16) that takes almost 45 seconds to run. The table tbl_flightno has some 5 million records, the tbl_airline around 12,000. It seems the database is a bit at the limit, and every now and then there are some orphan records generated. I haven't found the culprit for that yet.
So I'm currently checking every now and then for those orphans and then fix them. I am wondering now, if there is a better way to search for those orphans.
SELECT COUNT(DISTINCT N.World, N.AirlineCode) AS 'Orphans', COUNT(FlightNoID) AS 'Flights'
FROM tbl_flightno N
LEFT JOIN tbl_airline A ON A.World = N.World AND A.AirlineCode = N.AirlineCode
WHERE A.Airline IS NULL
However I'm not sure there is another, or better way.
Yes, updating the MySql version might benefit, also throwing more hardware would improve, but that would create much more work.
Thanks in advance for any hints.
EDIT: Added the additional information below:
Here is the EXPLAIN for the query.
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE N index World_Airline 81 5217525 100 Using index
1 SIMPLE A eq_ref PRIMARY,VUnique,vWorld,vAirline,vReadOnly PRIMARY 81 as.N.AirlineCode,as.N.World 1 10 Using where; Not exists
-- ----------------------------
-- Table structure for tbl_airline
-- ----------------------------
DROP TABLE IF EXISTS `tbl_airline`;
CREATE TABLE `tbl_airline` (
`AirlineCode` int(8) NOT NULL,
`World` varchar(25) NOT NULL,
`Airline` varchar(255) NOT NULL,
`Last_update` datetime DEFAULT NULL,
`Destinations` int(8) DEFAULT NULL,
`NoFlights` int(8) DEFAULT NULL,
`CityPairs` int(8) DEFAULT NULL,
`Headquarter` varchar(3) DEFAULT NULL,
`TZ` varchar(6) DEFAULT NULL,
`ReadOnly` int(1) NOT NULL DEFAULT '0',
`Code` varchar(10) DEFAULT NULL,
`Alliance` varchar(255) DEFAULT NULL,
`Stock` varchar(10) DEFAULT NULL,
`Country` varchar(255) DEFAULT NULL,
`LegalHome` varchar(255) DEFAULT NULL,
`Parent` varchar(255) DEFAULT NULL,
`Director` varchar(100) DEFAULT NULL,
`Founded` date DEFAULT NULL,
`Rating` varchar(5) DEFAULT NULL,
PRIMARY KEY (`AirlineCode`,`World`),
UNIQUE KEY `VUnique` (`World`,`AirlineCode`) USING BTREE,
KEY `vWorld` (`World`) USING BTREE,
KEY `vAirline` (`AirlineCode`) USING BTREE,
KEY `vReadOnly` (`World`,`ReadOnly`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SET FOREIGN_KEY_CHECKS=1;
-- ----------------------------
-- Table structure for tbl_flightno
-- ----------------------------
DROP TABLE IF EXISTS `tbl_flightno`;
CREATE TABLE `tbl_flightno` (
`FlightNoID` bigint(8) unsigned NOT NULL AUTO_INCREMENT,
`FlightID` bigint(8) unsigned NOT NULL,
`World` varchar(25) NOT NULL,
`AirlineCode` int(8) NOT NULL,
`FlightNo` varchar(10) NOT NULL,
`Days` varchar(7) NOT NULL,
`TimeDep` time NOT NULL,
`TimeArr` time NOT NULL,
`ActType` varchar(3) NOT NULL,
`ActLink` varchar(6) NOT NULL,
`Operator` varchar(255) NOT NULL,
`Remarks` varchar(50) DEFAULT NULL,
`Validity` varchar(11) DEFAULT NULL,
`Distance` int(10) DEFAULT NULL,
`Duration` time DEFAULT NULL,
`Speed` int(10) DEFAULT NULL,
`Via` int(1) DEFAULT '0',
`AptFromC` varchar(3) DEFAULT NULL,
`AptDestC` varchar(3) DEFAULT NULL,
PRIMARY KEY (`FlightNoID`),
UNIQUE KEY `FlightNoID` (`FlightNoID`) USING BTREE,
KEY `World_Airline` (`World`,`AirlineCode`) USING BTREE,
KEY `DepTimes` (`TimeDep`,`FlightID`) USING BTREE,
KEY `FlightID` (`FlightID`) USING BTREE,
KEY `Distance` (`World`,`AirlineCode`,`Distance`) USING BTREE,
KEY `ActType` (`ActType`) USING BTREE,
KEY `Via` (`Via`) USING BTREE,
KEY `Remarks` (`World`,`Remarks`) USING BTREE,
KEY `ActLink` (`ActLink`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=25879501 DEFAULT CHARSET=utf8;
SET FOREIGN_KEY_CHECKS=1;
CodePudding user response:
You can try the following query:
SELECT COUNT(DISTINCT N.World, N.AirlineCode) AS Orphans
, COUNT(CASE WHEN NOT EXISTS (
SELECT 1 FROM tbl_airline A
WHERE A.World = N.World
AND A.AirlineCode = N.AirlineCode
) THEN 1
END) AS Flights
FROM tbl_flightno N;
CodePudding user response:
Your indexes are optimal and your query formulation is optimal. The problem is that it needs 5M * 12K checks.
If it is I/O bound then, please provide table sizes and value of innodb_buffer_pool_size
and the size of RAM. with these, I may have advice on how to cut back on I/O.
[An aside] There are several redundant indexes, but this won't impact the speed of that SELECT
PRIMARY KEY (`AirlineCode`,`World`),
UNIQUE KEY `VUnique` (`World`,`AirlineCode`) USING BTREE,
KEY `vWorld` (`World`) USING BTREE,
KEY `vAirline` (`AirlineCode`) USING BTREE,
KEY `vReadOnly` (`World`,`ReadOnly`) USING BTREE
-->
PRIMARY KEY (`AirlineCode`,`World`),
KEY (`World`,`AirlineCode`) USING BTREE,
KEY `vReadOnly` (`World`,`ReadOnly`) USING BTREE
In the other table, toss these two:
UNIQUE KEY `FlightNoID` (`FlightNoID`) USING BTREE,
KEY `World_Airline` (`World`,`AirlineCode`) USING BTREE,
"Rules":
- In MySQL
PRIMARY KEY
is aUNIQUE
key. - When you have
INDEX(a,b)
,INDEX(a)
is unnecessary.