Home > Software engineering >  Historical big data slow queries
Historical big data slow queries

Time:07-10

I have problem with slow queries.

PS

MariaDB: mariadb:10.3.25 - InnoDB

I optimized most DB configurations

Structure

create table customers
(
    id              bigint unsigned auto_increment
        primary key,
 
    email           varchar(255)                                                                                 null,
    full_name       varchar(255)                                                                                 null,
    country         varchar(2)                                                                                   null,
    first_name      varchar(255)                                                                                 null,
    second_name     varchar(255)                                                                                 null,
    company_name    varchar(255)                                                                                 null,
    gender          char                                                                                         null,
    birth_date      date                                                                                         null,
    state           varchar(3)                                                                                   null,
null,
    custom_field_1  varchar(255)                                                                                 null,
    custom_field_2  varchar(255)                                                                                 null,
    custom_field_3  varchar(255)                                                                                 null,
    created_at      timestamp                                                                                    null,
    updated_at      timestamp                                                                                    null,
    deleted_at      timestamp                                                                                    null
)
    collate = utf8mb4_unicode_ci;
create table customer_daily_stats
(
    date        date                   not null,
    campaign_id  bigint                 not null,
    customer_id bigint                 not null,
    event_1        int unsigned default 0 not null,
    event_2       int unsigned default 0 not null,
    event_3      int unsigned default 0 not null,
    event_4        int unsigned default 0 not null,
    event_5   int unsigned default 0 not null,
    constraint customer_daily_stats_date_customer_id_campaign_id_unique
        unique (date, customer_id, campaign_id)
)
    collate = utf8mb4_unicode_ci;

create index customer_daily_stats_customer_id_date_index
    on customer_daily_stats (customer_id, date);

create index customer_daily_stats_campaign_id_index
    on customer_daily_stats (campaign_id);

customers ~ 1 - 5 millions rows

customer_daily_stats ~ 1 - 100 millions rows

Queries

select 
    customers.*,
    IFNULL(
            SUM(events_aggregation.event_1),
            0
        ) as event_1,
    IFNULL(
            SUM(events_aggregation.event_2),
            0
        ) as event_2,
    IFNULL(
            SUM(events_aggregation.event_3),
            0
        ) as event_3,
    IFNULL(
            SUM(events_aggregation.event_4),
            0
        ) as event_4
from
    `customers`
        left join customer_daily_stats as events_aggregation on `customers`.`id` = `events_aggregation`.`customer_id`
        and `events_aggregation`.`date` between '2021-09-06' and '2022-07-06'
group by
    `customers`.`id`;

Problems

Main idea is to have possibility to get aggregation by any dates.

Problem is that works too slow now and i need to do addition aggregations which decrease performance. One more problem i don't have a lot of disc space (250G and about 80% used already).

I have:

  • customers ~ 1.5m
  • customer_daily_stats ~ 50.000
  • query speed ~ 5s

Questions

  • Is there any methods to optimize my DB or another tools?
  • Is there any DBs that help my to increase performance?

CodePudding user response:

Change the indexes. You currently have

unique (date, customer_id, campaign_id)
INDEX(customer_id, date)
INDEX(campaign_id)

Maybe Change to:

PRIMARY KEY(customer_id, date, campaign_id)
INDEX(campaign_id)

BUT... And this is a big BUT. This rearrangement of indexing may significantly hurt other queries. We really need to see

  • All the big queries
  • EXPLAIN SELECT for each

Did you notice that the range is 10 months plus 1 day? This is because BETWEEN is 'inclusive'.

If 80% of disk is already used, you are in deep weeds. Any fixes will require more than 20% of the disk to achieve.

One thing to do (when you have enough disk space) is to shrink BIGINT (8 bytes, probably an excessive range) and INT UNSIGNED (4 bytes, 4 billion max) to smaller int types where practical.

I'm confused. These seem to contradict each other; please clarify:

customer_daily_stats ~ 1 - 100 millions rows
customer_daily_stats ~ 50.000

Some more things to help with the analysis:

innodb_buffer_pool_size
RAM size
disk footprint for tables (GB)
  • Related