Postgresql support data from 1 billion level??-CodePudding

Postgresql support data from 1 billion level??

CodePudding user response:

The single table feeling can support, if little storage (such as field, less field short)
Can try the partition table, but is not good enough

CodePudding user response:

I have to consider change postgresql to clickhouse, but clickhouse never used, do not know how it will pit,

CodePudding user response:

Can use partition can, support, mainly control the size of the index,
If you compare lazy, can be single table, with some indexes to split, according to the range index, for example,
Partition with reference to:
https://blog.csdn.net/goldenhawking/article/category/1222775

CodePudding user response:

Partition in this blog, although the test in ten million, but should and 1 billion are similar, because insert 1 billion article will take a long time, so there is no put up, and I in a production environment to store about 600 million records, more than 400 GB, do not have too big problem,

Partitions, non-relational field reference, this article partitions, to write the CHECK condition good, can directly according to avoid unnecessary searching child table,

CodePudding user response:

Postgresql is close to the performance of oracel, basically no problem

CodePudding user response:

It depends on how you use this data, 1 billion

CodePudding user response:

reference 丁劲犇 reply: 3/f

can support, with partition, is mainly to control the size of the index,
If you compare lazy, can be single table, with some indexes to split, according to the range index, for example,
Partition with reference to:
https://blog.csdn.net/goldenhawking/article/category/1222775

I now is to insert more than 200 million data spent more than 20 minutes, hope to improve the writing database performance have what way to improve performance
Thank you very much

CodePudding user response:

There are many factors affecting the performance of writing database, probably help to sum up a few words, if they had the money, willing to put into 10-200000, that go directly to the final (0),
1, if is a waveguide into the backup as a whole (e.g., pgRestore), that is the COPY, the fastest finished import, indexing and various constraints,
2, overall form that can be deleted, renamed, reference (1) to establish a temporary table, after the completion of the import, the old change _OLD, new to the right name,
3, most of the situation is not so ideal, is facing the hot import (structure, constraints, referential integrity, the name can't modify), generally follow the 4-6.
4, before the import parameters postgresql. Conf
(1) before the import closed auto_vacuum options,
(2) adjust the performance of a wal, checkpoint to relax, to the whole, sharing, maintain memory is big, if it is 32 gb server, Shared set to 1 g, maintain 1 g,
(3) if you import, there are a lot of trigger warning operation, set all kinds of log level to fatal above, so prevent to produce a large number of text log,
5, when the import, is generally a tool, such as Qt c + +, python, etc., tool development is good, is the key to Qt for example
(4) according to need to import after need to look for, want to be in on the database connection, is set to "forward" forwordOnly as soon as possible, reduce the cost of query first,
, (5) the task hoard for 100000 a unit with transaction cut, and then call QSqlQuery: : execBatch function overall insert, insert after commit,
6, if the work has been done or is slow, please type optimization of targeted analysis:
(6) observation network utilization, if network utilization & gt;=90%, please in local import database, rather than based on Ethernet, Ethernet can bring a lot of overhead,
(7) to observe the process of current server import (postgres) CPU, if CPU full short of mononuclear, shows disk burst, please use solid or rotary table array. This situation is often appear in traffic dense, single record simple situation,
(8) to observe the process of current server import (postgres) CPU, if full single-core CPU, and disk, that computing power is not enough, at this point, insert the drive N work session, until the CPU and disk reach equilibrium (usually 50% CPU mononuclear session, disk slightly congestion), and this kind of situation is less, such as your records, there are a lot of array JSON \ complex types, such as GIS, and a trigger in common while doing calculation,
,,, get basic is almost here... Temper:
(0) at the end of the day, money can significantly improve the rate of inserts, remember: money can toughness, buy 8 ~ 10 SSD disk array, CPU 5-6 xeon one thousand pieces, and then parallel session inserts, 1-8 don't have to see,

CodePudding user response:

refer to 7th floor newpb80 response:

Quote: reference 丁劲犇 reply: 3/f
can support, with partition, is mainly to control the size of the index,
If you compare lazy, can be single table, with some indexes to split, according to the range index, for example,
Partition with reference to:
https://blog.csdn.net/goldenhawking/article/category/1222775

I now is to insert more than 200 million data spent more than 20 minutes, hope to improve the writing database performance have what way to improve performance
Thank you very much

https://blog.csdn.net/goldenhawking/article/details/107583951