Home > Net >  Partitioning by range columns unexpected behavior
Partitioning by range columns unexpected behavior

Time:10-25

I have MySQL table partitioned by range columns (c_id and created_at) and I created 2 partitions:

  1. logs_1_2020 (c_id less than 2 and created less than 2021-01-01 00:00:00)

  2. logs_1_2021 (c_id less than 2 and created less than 2022-01-01 00:00:00)

When I run

INSERT INTO example_log_table (c_id, data, created)
    VALUES (1, 'test', '2021-10-24 18:16:08')

I'm supposed to find the result stored in logs_1_2021, but I was shocked when I found her in logs_1_2020.

Does anyone have an explanation for that?

This table SQL generator:

CREATE TABLE example_log_table (
                        id int auto_increment ,
                        c_id int,
                        data TEXT NOT NULL,
                        created DATETIME NOT NULL,
                        primary key (id,c_id,created)
) PARTITION BY RANGE columns (c_id,created)(
    PARTITION logs_1_2020 VALUES LESS THAN  (2,'2021-01-01 00:00:00'),
    PARTITION logs_1_2021 VALUES LESS THAN  (2,'2022-01-01 00:00:00')
);

CodePudding user response:

When you use multiple columns as your partitioning key, the placement is based on tuple comparison. You can test if a tuple is less than another tuple this way (MySQL 8.0):

select row(1, '2021-10-24 18:16:08') < row(2, '2021-01-01 00:00:00');
 --------------------------------------------------------------- 
| row(1, '2021-10-24 18:16:08') < row(2, '2021-01-01 00:00:00') |
 --------------------------------------------------------------- 
|                                                             1 |
 --------------------------------------------------------------- 

The rules for tuple inequality comparison are tricky. I suggest you read https://dev.mysql.com/doc/refman/8.0/en/partitioning-columns-range.html and https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_less-than carefully.

For row comparisons, (a, b) < (x, y) is equivalent to:

(a < x) OR ((a = x) AND (b < y))

In this case, 1 is less than 2, so the tuple you inserted is less than the tuple that defines the upper bound of partition logs_1_2020.

You can also think about how the data would be sorted if you were to query a set of rows with ORDER BY c_id,created. It would sort by c_id first, and then only in cases of ties on c_id it would sort the ties by created.

CodePudding user response:

Unless you have some good reason for that Partitioning, Drop it and change the indexes to

PRIMARY KEY(c_id, created, id),
INDEX(id)

If you expect to have lots of data and wish do delete "old data", the PARTITION BY RANGE over just created; this facilitates periodic DROP PARTITION. And the two indexes above are still valid and useful.

  • Related