Postgres level in the amount of data query optimization-CodePudding

There are several SQL query efficiency is very low, don't know how to optimize, hope can get some help,
1. According to time grouping query
Select to_char (date, 'YYYY - MM - DD HH24) as d, count (*) as c from table group by d order by d

2. According to time
Select count (*) from the table where the date between '12-5' and '12-6'

3. To obtain a certain condition of all records
The select a.c ol1, a.c ol2, a.c ol3, a.c ol4 from table a join (select col2, col3 from table where col1='123' and col4 between '12-5' and '12-6' group by col1, col2) b using (col1, col2) order by a.c ol2, a.c ol4

CodePudding user response:

The first query, create to_char function index try, whether to transfer table scan to scan only indexes;
For the second and third inquiries, and data distribution, if the where condition can filter out a small number of records, you can create conditions for the combination of the fields on the index, a third more complicated point is the connection, if the table by col1 and col4 filtered very little amount of data, and a single col1 and col2 field return data quantity is less, on the optimization of query 2 similar, can also be in the connection field to create an index on col1 and col2,

CodePudding user response:

The first query function index have a problem, function index of the problem, I this just contains time zones, specific here: https://github.com/digoal/blog/blob/master/201206/20120626_02.md, with the date_trunc () on the do level data quantity is about 16 s, short of order of magnitude of ascension,
Just in time, the second time fields have been added to the index, but being the amount of data response time is still far can not meet the requirements,
Third, first meet this condition is more than 100 ten thousand, records of the business decision must be one-time get this condition, reprocessing, paging, col1 and col4 has been indexed, col1 and col2 data very much, but no index, the query is the subquery modified into, while the efficiency is improved, still can not meet the requirements, the time would be about 13 s,

CodePudding user response:

If your SQL finally returned to the amount of data that are in the millions, tens of millions of orders of magnitude, partition is your choice, and more than ten seconds you should meet, so to speak

CodePudding user response:

reference minsic78 reply: 3/f

if your SQL finally returned to the amount of data that are in the millions, tens of millions of orders of magnitude, partition is your choice, and more than ten seconds to say you should meet

In fact, I doubt that the 10 seconds, if not hundreds of nodes of MPP, even finish back to thousands of data in 10 seconds or so this feat,

CodePudding user response:

Before tried according to the partition on a daily basis, but daily data from 20 million to 30 million, query time is similar to don't partition, and no obvious ascension, the rest of the don't know how to do the

CodePudding user response:

reference 5 floor ceshell reply:

tried before according to the partition on a daily basis, but daily data from 20 million to 30 million, the query time and do not partition, almost no significant increase, and the rest just don't know how to do the

If the amount of data every day, and statistics, it should use the GP, not with PG

CodePudding user response:

Choose pg is mainly in order to use the function of the spatial database, the other doesn't conform to the requirements, also not too familiar with,

CodePudding user response:

refer to 7th floor ceshell response:

select pg is mainly in order to use the function of the spatial database, the other doesn't conform to the requirements, also not very familiar with,

GP is based on the PG, should also support,

So large amounts of data statistics, the stand-alone database to quickly calculate the results, not unlikely, but could not ah

CodePudding user response:

refer to the eighth floor minsic78 response:

Quote: refer to 7th floor ceshell response:

Choose pg is mainly in order to use the function of the spatial database, the other doesn't conform to the requirements, also not very familiar with,

GP is based on the PG, should also support,

So large amounts of data statistics, the stand-alone database to quickly calculate the results, not unlikely, but could not ah

Say impossible may be too extreme, there is a good point if storage, such as nvme SSD, and can speed up, but want to keep stable speed, partition is little not,

CodePudding user response:

Oh, this is no way out, hardware only try to do in their own scope

CodePudding user response:

1, try to build a to_char (date, 'YYYY - MM - DD HH24) function index, but if you are willing to pay a functional index price, it is better to simply use the trigger and intermediate statistics, query performance will be much better,
2, the second do not have what way, if the amount of data every day a lot, only the partition table, incidentally beteen is a bad habit, every day the first seconds nanoseconds depends on the accuracy of the data (or milliseconds) will be repeated calculation, before closed after the open interval is rigorous reasonable,
3. The third didn't understand,