Home > database >  Postgresql data to how to improve the performance problem
Postgresql data to how to improve the performance problem

Time:09-16

Each master of database optimization are:
Following a performance problems in the current development, to improve performance, please and thank you very much.
Now there is a corresponding four tables CSV file, corresponding to each table, there are seven CSV file, a total of 28 file,
28 CSV file size is 16 g, one of the tables of the import data about 200 million rows,
Use posgresql copy command will import data into posgresql database,
Now using Java to write a program, using threads, thread pool management up 28 threads at the same time, the copy command,
Run the application environment is centos 16 g memory 12 kenel 24 CPU thread,
Hard disk write speed is 200 megabits per second
Now the average time to complete the import task is 1400 seconds, nearly more than 23 minutes,
With iostat command found IO utilization rate is 100%, basically feel performance bottlenecks in IO,
The CPU utilization rate is 20%.

Customer actual data quantity should be 70 times the amount of the data, it is nearly 27 hours
Customer will not accept,

Haven't dealt with this amount of data into database, because of adopting posgresql copy command
Is the bulk import, have what method to improve performance, hope expert guidance,

CodePudding user response:

On solid SSD disk array, IO this piece, mechanical performance for multiple session soon, head shaking too time consuming,

In addition, so many records, even if the import, the query is also trouble, do the partition? Feeling is very troublesome,

CodePudding user response:

Thank you very much for the building Lord,
Except on solid SSD disk array, is there any other ways to optimize,
Is there any help adjust postgresql parameters, is there any good method for adjusting parameters can share,

Other databases philosophers,
Can share such a large amount of data data import postgresql of actual combat experience,
From the aspects of code level or parameter adjustment, and see if I can further optimize,
Thank you very much.

CodePudding user response:

Consult the webmaster
Use the same database connection, big data using multithreading PostgreSQL's copy method is used to read the CSV file is written to multiple tables, can realize the concurrent
Write the database
Expert advice please
Thank you very much

CodePudding user response:

Masters
In order to guarantee the data consistency, insert the mistakes, can all rolled back,
So want to multi-threaded use a connection, call the PostgreSQL copy method read multiple tables CSV file
Don't know to use the same connection, would adopt multithreading, PostgreSQL implement concurrent processing
Can you tell me whether PostgreSQL will adopt concurrent processing
Thank everybody to give directions

CodePudding user response:

PG server-side a connection is a process, in this connection are serial execute commands, not half a command to execute a command, and so on the client side multi-threaded Shared a connection, also want to lock do line up, so he didn't make much sense with multithreading.
  • Related