Import Large CSV File to SQL Server Line by Line-CodePudding

I have been trying to load a large a CSV file into my SQL Server using C# through an SSIS script task and it works for files with about 600,000 records or so. But we have one file which is around 1,000,000 records and it is causing my server to crawl to a halt and CPU to spike. My problem is I am loading this into memory and not running line by line inserts which would work quickly. I tried using StreamWriter.ReadLine as well as System.IO.File.ReadAllLines but both times it is loading all into memory then inserting. I then changed it to try to load line by line but in the below code, it inserts the first line, then moves to the next and unfortunately inserts the first and second, then first, second, third, and so on. I'm not sure where my logic is wrong where it clears the first line out and Can someone help with the below code?

            System.Collections.Generic.IEnumerable<string> ReadAsLines(string filename)
            {
                using (var reader = new StreamReader(sodFileName))
                while (!reader.EndOfStream)
                yield return reader.ReadLine();
            }

            {
                var reader = ReadAsLines(sodFileName);
                var data = new DataTable();

                //Get column headers
                var headers = reader.First().Split(',');
                foreach (var header in headers)
                    data.Columns.Add(header);

                //Get records
                var records = reader.Skip(1);
                    foreach (var record in records)
                    data.Rows.Add(record.Split(','));

                //Load records into table
                var sqlBulk = new SqlBulkCopy(Conn);
                sqlBulk.BulkCopyTimeout = 0;
                sqlBulk.DestinationTableName = "dbo.sodFileName";
                sqlBulk.WriteToServer(data);
             }

Thanks!

EDIT - I also am trying to run the following using ReadLines after researching but am getting a compiler error where I have lines[0].Split(',');

Compiler Error CS0021: Cannot apply indexing with [] to an expression of type 'type'

Code:

                var lines = System.IO.File.ReadLines(sodFileName);
                if (lines.Count() == 0) return;
                var columns = lines[0].Split(',');
                var table = new DataTable();
                foreach (var c in columns)
                    table.Columns.Add(c);
                for (int i = 1; i < lines.Count() - 1; i  )
                    table.Rows.Add(lines[i].Split(','));
                var sqlBulk = new SqlBulkCopy(Conn);
                sqlBulk.BulkCopyTimeout = 0;
                sqlBulk.DestinationTableName = "dbo.sodFileName";
                sqlBulk.WriteToServer(table);

Would this be a better way and would anyone know how to fix that error?

CodePudding user response：

Consider setting the batch size to 10000. If the large transaction is an issue or log growth is slow, consider using the UseInternalTransaction option if it's okay to import some batches and skip those that have issues.

"You can explicitly specify the UseInternalTransaction option in the SqlBulkCopy class constructor to explicitly cause a bulk copy operation to execute in its own transaction, causing each batch of the bulk copy operation to execute within a separate transaction."

https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy.batchsize?view=dotnet-plat-ext-6.0

https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopyoptions?view=dotnet-plat-ext-6.0

https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/transaction-and-bulk-copy-operations

CodePudding user response：

Please check this solution,

https://www.codeproject.com/Articles/685310/Simple-and-Fast-CSV-Library-in-Csharp

This is memory efficient solution, I was able to import .csv files with millions of record.