Is it worth to open multiple database connections for sql insertion-CodePudding

I am writing a project related to mass data fetching. Currently I am using .NET Framework 4.8 and the Mysql package to start connection and insert data to the database server.

I am going to insert around 400, 000 line/second. I am concern that the SQL connection may become the bottleneck of my program. I would like to know if I create a multitthread connection with the sql and insert the data using a consumer queue, would it be faster and is it worth it(pros and cons)?

In my intuitive thought it would be faster but I am not sure how much performance it can provide with respect of the overhead for threads. I am not a SQL expert so it would be nice if someone could explain the pros and cons of opening multiple connections to a SQL on multiple threads.

CodePudding user response：

Rumors, opinions, hearsay, facts, version-dependent benchmarks, some personal experience, etc...

Multiple threads will improve throughput, but with limits:

Throughput is capped at about half the theoretical limit. (Your "certain percentage") (This is based on benchmarks from a multi-threaded package; I forget the name; it was a decade ago.)
Multiple threads will compete with each other over Mutexes and other necessary locking mechanisms.
As of about 5.7, 64 threads was MySQL's limit for multi-threading; above that, throughput stalled or even dropped. (Source: Many Oracle benchmarks bragging about how much better one version was than the previous.) (Meanwhile, latency for each thread went through the roof.)
Each thread should batch the data if possible.

Batching:

LOAD DATA is the fastest way to INSERT lots of rows from a single thread at a single time. But if you include the cost of writing the file to LOAD, that may make it effectively slower than batched inserting.
Batched INSERT is a close second. But it caps out at "hundreds" of rows, when it hits either some limit or "diminishing returns".
Batched Inserting is 10 times as fast as inserting one row per INSERT query. So it (or LOAD DATA) is worth using for high speed ingestion. (Source: many different timed tests.)

Source of data:

Some data sources necessarily deliver only one row at a time (eg, sensor data from vehicles every N seconds.) This begs from some intermediate layer to batch the data.
A discussion of gathering data: http://mysql.rjweb.org/doc.php/staging_table

What happens after loading the data? Surely this is not a write-only-never-read table.

Normalization is useful for shrinking the disk footprint; this is best done in batches. See Normalization
PARTITIONing is rarely useful, except for eventual purging of old data. See Partition
A huge 'Fact' table is hard to search; consider building Summary data as you ingest he data: Summary Tables
It may even be practical to do the above processing, then toss the raw data. It sounds like you might be acquiring a terabyte of data per day.