Home > Enterprise >  SqlBulkCopy.WriteToServerAsync() does not write to target SQL Server table, bulkCopy.WriteToServer()
SqlBulkCopy.WriteToServerAsync() does not write to target SQL Server table, bulkCopy.WriteToServer()

Time:12-03

Just as the title states. I am trying to load a ~8.45GB csv file with ~330 columns (~7.5 million rows) into a SQL Server instance, but I'm doing the parsing internally as it the file has some quirks to it (with comma delimitations and quotes, etc). The heavy duty bulk insert and line parsing is done as below:

var dataTable = new DataTable(TargetTable);
using var streamReader = new StreamReader(FilePath);
using var bulkCopy = new SqlBulkCopy(this._connection, SqlBulkCopyOptions.TableLock, null)
{
    DestinationTableName = TargetTable,
    BulkCopyTimeout = 0,
    BatchSize = BatchSize,

};

/// ...
var outputFields = new string[columnsInCsv];
this._connection.Open();
while ((line = streamReader.ReadLine()) != null)
{
    //get data
    CsvTools.ParseCsvLineWriteDirect(line, ref outputFields);

    // insert into datatable
    dataTable.LoadDataRow(outputFields, true);

    // update counters
    totalRows  ;
    rowCounter  ;

    if (rowCounter >= BatchSize)
    {
        try
        {
            // load data
            bulkCopy.WriteToServer(dataTable); // this works.
            //Task.Run(async () => await bulkCopy.WriteToServerAsync(dataTable)); // this does not.
            //bulkCopy.WriteToServerAsync(dataTable)) // this does not write to the table either. 
            rowCounter = 0;
            dataTable.Clear();
        }
        catch (Exception ex)
        {
            Console.Error.WriteLine(ex.ToString());
            return;
        }
    }
}
// check if we have any remnants to load
if (dataTable.Rows.Count > 0)
{
    
    bulkCopy.WriteToServer(dataTable); // same here as above
    //Task.Run(async () => await bulkCopy.WriteToServerAsync(dataTable));
    //bulkCopy.WriteToServerAsync(dataTable));
    dataTable.Clear();
}
this._connection.Close();

Obviously I would like this to be fast as possible. I noticed via profiling that the WriteToServerAsync method was almost 2x as fast (in its execution duration) as the WriteToServer method, but when I use the async version, no data appears to be written to the target table (whereas the non-async version seems to commit the data fine but much more slowly). I'm assuming there is something here I forgot (to somehow trigger the commit to the table), but I am not sure what could prevent committing the data to the target table.

Note that I am aware that SQL Server has a BULK INSERT statement but I need more control over the data for other reasons and would prefer to do this in C#. Also perhaps relevant is that I am using SQL Server 2022 Developer edition.

CodePudding user response:

Fire and forget tasks

By performing Task.Run(...) or DoSomethingAsync() without a corresponding await essentially makes the task a fire and forget task. The "fire" refers to the creation of the task and the "forget" due to the fact that the coder appears not to be interested in any return value (if applicable) or desires any knowledge as to when the task completes.

Though not immediately problematic, it is if the calling thread or process exits before the task completes! The task will be terminated before completion. This problem typically occurs in short-lived processes such as console apps, not so much for say Windows Services, web apps with 20-minute App Domain timeouts et all.

Example

  • sending an asynchronous keep-alive/heartbeat to a remote service or monitor.

    • there is no return value to monitor, asynchronous or otherwise
    • It won't matter if it fails as a more up-to-date call will eventually replace it
    • It won't matter if it doesn't complete in time if the hosting process exits for some reason (after-all we are a heartbeat, if the process is ended naturally there is no heart to beat).

Awaited tasks

Consider prefixing it with a await as in await bulkCopy.WriteToServerAsync(...);. This way the task is linked to the parent task/thread and ensures the former (unless it is terminated by some other means) does not exit before the task completes.

Naturally the containing method and those in the call stack will need to be marked async and also have await prefixes on the corresponding methods. This "async all the way" creates a nice daisy chain of linked tasks all the way to the parent (or at least to the last method in the call chain with an await or a legacy ContinueWith()).

  • Related