Home > front end >  Faster way to loop through DataTable elements
Faster way to loop through DataTable elements

Time:09-30

Description of the current situation:

I have an excel file of approximately 315 columns and 4000 rows. The file contains the answers to a 300-question questionnaire. The data format is as follows:

(Headers)   A | B | C | D | E | F       | Q.1 | Q.2 | ... | Q.300 |
(FirstRow)  Info of first participant   | AnswerCode for every Q  | 

The columns A to F contain contain info on every participant, while the columns Q.1 to Q.300 contain the respective answer code to each question. After storing the file as a large DataTable:

I need to load all 4000 rows on an existing database table, but before I do that I must edit the data format. The end result must become:

ParticipantCode | QuestionCode | AnswerCode | DateOfRegistration
00001           | 0001         | 1234567    | yyyy-MM-dd HH:mm:ss
...             | ...          | ...        | ...
00001           | 0300         | 1234567    | yyyy-MM-dd HH:mm:ss
00002           | 0001         | 1234567    | yyyy-MM-dd HH:mm:ss
...             | ...          | ...        | ...
04000           | 0300         | 1234567    | yyyy-MM-dd HH:mm:ss

So every row of the original ExcelDataTable is transformed into 300 rows in the FinalDataTable. In this way, the FinalDataTable will have about 1.2 million rows.

What Have I implemented so far:

Private Function MyFunction()
    For Each ExcelRow As DataRow In ExcelDataTable.Rows
        For Each ExcelColumn As DataColumn In ExcelDataTable.Columns
            QuestionCodeFound = False
            ExcelColumnNameRaw = ExcelColumn.ColumnName.ToString.Trim
            If ExcelColumnNameRaw.StartsWith("Q") Then
                ' Correct the headers
                ExcelColumnSplit = ExcelColumnNameRaw.Split("#")
                ExcelColumnName = String.Concat(ExcelColumnSplit(0), ExcelColumnSplit(1))

                SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '"   ExcelColumnName   "'")

                ' Search for "_", because some questions are different
                If SelectedRowFromDT.Length > 0 Then
                    QuestionCodeFound = True
                Else
                    Dim ExcelColumnSplitForMult As String()
                    ExcelColumnSplitForMult = ExcelColumnName.Split("_")
                    SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '"   ExcelColumnSplitForMult(0).ToString   "'")
                    If SelectedRowFromDT.Length > 0 Then
                        QuestionCodeFound = True
                    End If
                End If
                If QuestionCodeFound Then
                    Dim QuestionCode As String
                    Dim QuestionTypeDataTable As DataTable
                    Dim QuestionType As String

                    ' Get the Question Type from the respective table
                    QuestionType = String.Empty
                    QuestionCode = SelectedRowFromDT(0).Item("QuestionCode").ToString
                    QuestionTypeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)

                    If QuestionTypeDataTable.Rows.Count > 0 Then
                        QuestionType = QuestionTypeDataTable.Rows(0).Item(0).ToString.Trim
                    End If

                    ' Fix the Date Format 
                    DateRaw = ExcelRow.Item(1).ToString
                    DateSplit = DateRaw.Split("/")
                    If DateSplit(0).Length = 1 Then
                        DateSplit(0) = String.Concat("0", DateSplit(0))
                    End If
                    If DateSplit(1).Length = 1 Then
                        DateSplit(1) = String.Concat("0", DateSplit(1))
                    End If
                    DateText = String.Concat(DateSplit(0), "/", DateSplit(1), "/", DateSplit(2))
                    DateRegistration = DateTime.ParseExact(DateText, "MM/dd/yyyy", CultureInfo.InvariantCulture)
                    DateRegistrationReformed = DateRegistration.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture)
                    DateRegFinal = DateTime.ParseExact((DateRegistrationReformed   " "   "10:00:00").ToString, "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture)

                    Dim AnswerValue As String
                    Dim AnswerCode As String
                    Dim AnswerCodeDataTable As DataTable
                    Dim QuestionWasAnswer As String

                    Dim AnswerValueRow() As DataRow = ExcelDataTable.Select("ParticipantCode = '"   ExcelRow.Item(2).ToString   "'")

                    AnswerCodeDataTable = New DataTable
                    AnswerValue = ""
                    QuestionWasAnswer = "0"

                    ' Complete "QuestionWasAnswer" field for all questions and retrieve the AnswerCode for the answer given by each participant
                    If AnswerValueRow.Length > 0 And AnswerValueRow(0).Item(ExcelColumnNameRaw).GetType IsNot GetType(DBNull) Then
                        If Not (QuestionType.Equals("02") Or QuestionType.Equals("03")) Then
                            AnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw)
                            QuestionWasAnswer = "1"
                        ElseIf QuestionType.Equals("02") Or QuestionType.Equals("03") Then
                            Dim ExcelColumnSplitForMultSecond As String()
                            Dim MultAnswerValue As String

                            ExcelColumnSplitForMultSecond = ExcelColumnName.Split("_")
                            MultAnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw).ToString.Trim
                            AnswerValue = ExcelColumnSplitForMultSecond(1).ToString

                            If MultAnswerValue.Equals("1") Then
                                QuestionWasAnswer = "1"
                            ElseIf MultAnswerValue.Equals("2") Then
                                QuestionWasAnswer = "2"
                            End If
                        End If

                        ' Search in the Answers table for the existing AnswerCode
                        SQLString = String.Format("SELECT Answers.AnswerCode
                                                       FROM Answers
                                                       WHERE Answers.QuestionCode = '{0}'
                                                           AND (Answers.AnswerNumber = '{1}' OR Answers.Answer = '{1}')", QuestionCode, AnswerValue)
                        AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)

                        If AnswerCodeDataTable.Rows.Count > 0 Then
                            AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
                            FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
                            ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode   1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
                        Else
                            ' If a given answer does not exist, save it in the respective table and then try again
                            Dim AnswerCodeLength = GetLengthFromSqlDataBase(My.Settings.ConnectionString, "Answers", "AnswerCode")
                            Dim NextAnswerCode = CalculateNextAnswerCode(AnswerCodeLength)
                            Dim NestAnswerNumber = CalculateNextAnswerNumber(QuestionCode)

                            SaveNewAnswer(NextAnswerCode, QuestionCode, NestAnswerNumber, AnswerValue)

                            SQLString = String.Format("SELECT Answers.AnswerCode
                                                       FROM Answers
                                                       WHERE Answers.QuestionCode = '{0}'
                                                           AND Answers.Answer = '{1}'", QuestionCode, AnswerValue)
                            AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)

                            If AnswerCodeDataTable.Rows.Count > 0 Then
                                AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
                                FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
                                ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode   1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
                            End If
                        End If
                    End If
                End If
            End If
        Next
    Next
    Return FormattedDataTable
End Function

After that, I bulk insert the FinalDataTable on the DB.

The problem I am facing:

Using the current program I built, every row in the ExcelDataTable takes about 40 seconds to transform into 300 rows in the FinalDataTable. If I try to load all 4000 rows, it will take more than 40 hours to transform the entire datatable. I need to find a faster way to do this.

CodePudding user response:

As mentioned, there isn't much to go off of on this with what has been provided.

I'm sure there are more helpful fixes to consider but I wanted to put my two cents in about the For Loops.

I recommend switching the

For Each

statements with

For i as integer = 0 to ExcelDataTable.Rows.Count - 1

I've read that For Each is not as performance-friendly as it gathers each "row" as a collection, therefore increasing the overhead per loop.

Here is a SO post about this subject:

Major difference between 'for each' and 'for' loop in .NET

Not sure if that will make a difference for you but thought I would recommend it anyway.

  • Related