Just as the title states. I am trying to load a ~8.45GB csv file with ~330 columns (~7.5 million rows) into a SQL Server instance, but I'm doing the parsing internally as it the file has some quirks to it (with comma delimitations and quotes, etc). The heavy duty bulk insert and line parsing is done as below:
var dataTable = new DataTable(TargetTable);
using var streamReader = new StreamReader(FilePath);
using var bulkCopy = new SqlBulkCopy(this._connection, SqlBulkCopyOptions.TableLock, null)
{
DestinationTableName = TargetTable,
BulkCopyTimeout = 0,
BatchSize = BatchSize,
};
/// ...
var outputFields = new string[columnsInCsv];
this._connection.Open();
while ((line = streamReader.ReadLine()) != null)
{
//get data
CsvTools.ParseCsvLineWriteDirect(line, ref outputFields);
// insert into datatable
dataTable.LoadDataRow(outputFields, true);
// update counters
totalRows ;
rowCounter ;
if (rowCounter >= BatchSize)
{
try
{
// load data
bulkCopy.WriteToServer(dataTable); // this works.
//Task.Run(async () => await bulkCopy.WriteToServerAsync(dataTable)); // this does not.
//bulkCopy.WriteToServerAsync(dataTable)) // this does not write to the table either.
rowCounter = 0;
dataTable.Clear();
}
catch (Exception ex)
{
Console.Error.WriteLine(ex.ToString());
return;
}
}
}
// check if we have any remnants to load
if (dataTable.Rows.Count > 0)
{
bulkCopy.WriteToServer(dataTable); // same here as above
//Task.Run(async () => await bulkCopy.WriteToServerAsync(dataTable));
//bulkCopy.WriteToServerAsync(dataTable));
dataTable.Clear();
}
this._connection.Close();
Obviously I would like this to be fast as possible. I noticed via profiling that the WriteToServerAsync
method was almost 2x as fast (in its execution duration) as the WriteToServer
method, but when I use the async
version, no data appears to be written to the target table (whereas the non-async
version seems to commit the data fine but much more slowly). I'm assuming there is something here I forgot (to somehow trigger the commit to the table), but I am not sure what could prevent committing the data to the target table.
Note that I am aware that SQL Server has a BULK INSERT
statement but I need more control over the data for other reasons and would prefer to do this in C#. Also perhaps relevant is that I am using SQL Server 2022 Developer edition.
CodePudding user response:
Fire and forget tasks
By performing Task.Run(...)
or DoSomethingAsync()
without a corresponding await
essentially makes the task a fire and forget task. The "fire" refers to the creation of the task and the "forget" due to the fact that the coder appears not to be interested in any return value (if applicable) or desires any knowledge as to when the task completes.
Though not immediately problematic, it is if the calling thread or process exits before the task completes! The task will be terminated before completion. This problem typically occurs in short-lived processes such as console apps, not so much for say Windows Services, web apps with 20-minute App Domain timeouts et all.
Example
sending an asynchronous keep-alive/heartbeat to a remote service or monitor.
- there is no return value to monitor, asynchronous or otherwise
- It won't matter if it fails as a more up-to-date call will eventually replace it
- It won't matter if it doesn't complete in time if the hosting process exits for some reason (after-all we are a heartbeat, if the process is ended naturally there is no heart to beat).
Awaited tasks
Consider prefixing it with a await
as in await bulkCopy.WriteToServerAsync(...);
. This way the task is linked to the parent task/thread and ensures the former (unless it is terminated by some other means) does not exit before the task completes.
Naturally the containing method and those in the call stack will need to be marked async
and also have await
prefixes on the corresponding methods. This "async all the way" creates a nice daisy chain of linked tasks all the way to the parent (or at least to the last method in the call chain with an await
or a legacy ContinueWith()
).