Home > Software engineering >  Use Indexes For Join on Indexed DATETIME and Indexed DATE columns
Use Indexes For Join on Indexed DATETIME and Indexed DATE columns

Time:01-24

EDIT

I misread my initial error and blamed the INDEX not being used on the wrong columns.

I was able to recreate the issue that I saw and the solution that ysth suggested worked.

Below are the create tables statements, inserts to the tables, and two queries - one that has the error and another with the solution which does not have it.

# Make tables and indices
DROP TABLE a;
DROP TABLE b;

create table a
(
    DT  DATE,
    USER INT,
    COMMENT_SENTIMENT INT,
    PRIMARY KEY (USER, DT));
CREATE INDEX a_DT_USER_IDX ON a (DT,USER);

create table b
(
    id                                int auto_increment primary key,
    DT  DATETIME(6),
    USER mediumtext,
    COMMENT_SENTIMENT INT);
CREATE INDEX b_DT_USER_IDX ON b (DT);
CREATE UNIQUE INDEX b_DT_USER ON b (USER(16), DT);

# Insert some dummy data
INSERT INTO a VALUES('2023-01-01', 5, 4);
INSERT INTO b VALUES(NULL, '2023-01-01 00:00:00', 5, 4);

# Explain that shows the issue I was seeing.
EXPLAIN
SELECT *
FROM a
JOIN b
ON a.DT = b.DT
AND a.USER = b.USER;

# Explain with the fix ysth suggested
EXPLAIN
SELECT *
FROM a
JOIN b
ON a.DT = b.DT
AND a.USER = CAST(b.USER AS DECIMAL );

__

The below information is incorrect. Please use the edit to see the issue I was having and it's solution.

I have three tables a, b, and c in my MySQL 5.7 database. SHOW CREATE statements for each table are:

CREATE TABLE `a` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `DT` date DEFAULT NULL,
  `USER` int(11) DEFAULT NULL,
  `COMMENT_SENTIMENT` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `b` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `DT` datetime DEFAULT NULL,
  `USER` int(11) DEFAULT NULL,
  `COMMENT_SENTIMENT` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `b_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `c` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `DT` date DEFAULT NULL,
  `USER` int(11) DEFAULT NULL,
  `COMMENT_SENTIMENT` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `b_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Table a has a DATE column a.DT, table b has a DATETIME column b.DT, and table c has a DATE column c.DT.

All of these DT columns are indexed.

As a caveat, while b.DT is a DATETIME, all of the 'time' portions in it are 00:00:00 and they always will be. It probably should be a DATE, but I cannot change it.

I want to join table a and table b on their DT columns, but explain tells me that their indices are not used:

Cannot use ref access on index 'b.DT_datetime_index' due to type or collation conversion on field 'DT'

When I join table a and b on a.DT and b.DT

SELECT *
FROM a
JOIN b
ON a.DT = b.DT;

The result is much slower than when I do the same with a and c

SELECT *
FROM a
JOIN c
ON a.DT = c.DT;

Is there a way to use the indices in join from the first query on a.DT = b.DT, specifically without altering the tables? I'm not sure if b.DT having only 00:00:00 for the time portion could be relevant in a solution.

The end goal is a faster select using this join.

Thank you!

-- What I've done section --

I compared the joins between a.DT = b.DT and a.DT = c.DT, and saw the time difference. I also tried wrapping b's DT column with DATE(b.DT), but explain gave the same issue, which is pretty expected.

CodePudding user response:

MySQL won't use an index to join DATE and DATETIME columns.

You can create a virtual column with the corresponding DATE and use that.

CREATE TABLE `b` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `DT` datetime DEFAULT NULL,
  `USER` int(11) DEFAULT NULL,
  `COMMENT_SENTIMENT` int(11) DEFAULT NULL,
  `DT_DATE` DATE AS (DATE(DT)),
  PRIMARY KEY (`id`),
  KEY `b_DT_USER_IDX` (`DT_DATE`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

SELECT *
FROM a
JOIN b
ON a.DT = b.DT_DATE;

CodePudding user response:

Assuming you want to read a and join b rows, you can just do

SELECT *
FROM a
JOIN b
ON b.DT = timestamp(a.DT);

If the other way around, then

SELECT *
FROM b
JOIN a
ON a.DT = date(b.DT);

No need for a virtual column.

  • Related