Description of the current situation:
I have an excel file of approximately 315 columns and 4000 rows. The file contains the answers to a 300-question questionnaire. The data format is as follows:
(Headers) A | B | C | D | E | F | Q.1 | Q.2 | ... | Q.300 |
(FirstRow) Info of first participant | AnswerCode for every Q |
The columns A to F contain contain info on every participant, while the columns Q.1 to Q.300 contain the respective answer code to each question. After storing the file as a large DataTable:
I need to load all 4000 rows on an existing database table, but before I do that I must edit the data format. The end result must become:
ParticipantCode | QuestionCode | AnswerCode | DateOfRegistration
00001 | 0001 | 1234567 | yyyy-MM-dd HH:mm:ss
... | ... | ... | ...
00001 | 0300 | 1234567 | yyyy-MM-dd HH:mm:ss
00002 | 0001 | 1234567 | yyyy-MM-dd HH:mm:ss
... | ... | ... | ...
04000 | 0300 | 1234567 | yyyy-MM-dd HH:mm:ss
So every row of the original ExcelDataTable is transformed into 300 rows in the FinalDataTable. In this way, the FinalDataTable will have about 1.2 million rows.
What Have I implemented so far:
Private Function MyFunction()
For Each ExcelRow As DataRow In ExcelDataTable.Rows
For Each ExcelColumn As DataColumn In ExcelDataTable.Columns
QuestionCodeFound = False
ExcelColumnNameRaw = ExcelColumn.ColumnName.ToString.Trim
If ExcelColumnNameRaw.StartsWith("Q") Then
' Correct the headers
ExcelColumnSplit = ExcelColumnNameRaw.Split("#")
ExcelColumnName = String.Concat(ExcelColumnSplit(0), ExcelColumnSplit(1))
SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '" ExcelColumnName "'")
' Search for "_", because some questions are different
If SelectedRowFromDT.Length > 0 Then
QuestionCodeFound = True
Else
Dim ExcelColumnSplitForMult As String()
ExcelColumnSplitForMult = ExcelColumnName.Split("_")
SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '" ExcelColumnSplitForMult(0).ToString "'")
If SelectedRowFromDT.Length > 0 Then
QuestionCodeFound = True
End If
End If
If QuestionCodeFound Then
Dim QuestionCode As String
Dim QuestionTypeDataTable As DataTable
Dim QuestionType As String
' Get the Question Type from the respective table
QuestionType = String.Empty
QuestionCode = SelectedRowFromDT(0).Item("QuestionCode").ToString
QuestionTypeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)
If QuestionTypeDataTable.Rows.Count > 0 Then
QuestionType = QuestionTypeDataTable.Rows(0).Item(0).ToString.Trim
End If
' Fix the Date Format
DateRaw = ExcelRow.Item(1).ToString
DateSplit = DateRaw.Split("/")
If DateSplit(0).Length = 1 Then
DateSplit(0) = String.Concat("0", DateSplit(0))
End If
If DateSplit(1).Length = 1 Then
DateSplit(1) = String.Concat("0", DateSplit(1))
End If
DateText = String.Concat(DateSplit(0), "/", DateSplit(1), "/", DateSplit(2))
DateRegistration = DateTime.ParseExact(DateText, "MM/dd/yyyy", CultureInfo.InvariantCulture)
DateRegistrationReformed = DateRegistration.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture)
DateRegFinal = DateTime.ParseExact((DateRegistrationReformed " " "10:00:00").ToString, "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture)
Dim AnswerValue As String
Dim AnswerCode As String
Dim AnswerCodeDataTable As DataTable
Dim QuestionWasAnswer As String
Dim AnswerValueRow() As DataRow = ExcelDataTable.Select("ParticipantCode = '" ExcelRow.Item(2).ToString "'")
AnswerCodeDataTable = New DataTable
AnswerValue = ""
QuestionWasAnswer = "0"
' Complete "QuestionWasAnswer" field for all questions and retrieve the AnswerCode for the answer given by each participant
If AnswerValueRow.Length > 0 And AnswerValueRow(0).Item(ExcelColumnNameRaw).GetType IsNot GetType(DBNull) Then
If Not (QuestionType.Equals("02") Or QuestionType.Equals("03")) Then
AnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw)
QuestionWasAnswer = "1"
ElseIf QuestionType.Equals("02") Or QuestionType.Equals("03") Then
Dim ExcelColumnSplitForMultSecond As String()
Dim MultAnswerValue As String
ExcelColumnSplitForMultSecond = ExcelColumnName.Split("_")
MultAnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw).ToString.Trim
AnswerValue = ExcelColumnSplitForMultSecond(1).ToString
If MultAnswerValue.Equals("1") Then
QuestionWasAnswer = "1"
ElseIf MultAnswerValue.Equals("2") Then
QuestionWasAnswer = "2"
End If
End If
' Search in the Answers table for the existing AnswerCode
SQLString = String.Format("SELECT Answers.AnswerCode
FROM Answers
WHERE Answers.QuestionCode = '{0}'
AND (Answers.AnswerNumber = '{1}' OR Answers.Answer = '{1}')", QuestionCode, AnswerValue)
AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)
If AnswerCodeDataTable.Rows.Count > 0 Then
AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode 1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
Else
' If a given answer does not exist, save it in the respective table and then try again
Dim AnswerCodeLength = GetLengthFromSqlDataBase(My.Settings.ConnectionString, "Answers", "AnswerCode")
Dim NextAnswerCode = CalculateNextAnswerCode(AnswerCodeLength)
Dim NestAnswerNumber = CalculateNextAnswerNumber(QuestionCode)
SaveNewAnswer(NextAnswerCode, QuestionCode, NestAnswerNumber, AnswerValue)
SQLString = String.Format("SELECT Answers.AnswerCode
FROM Answers
WHERE Answers.QuestionCode = '{0}'
AND Answers.Answer = '{1}'", QuestionCode, AnswerValue)
AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)
If AnswerCodeDataTable.Rows.Count > 0 Then
AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode 1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
End If
End If
End If
End If
End If
Next
Next
Return FormattedDataTable
End Function
After that, I bulk insert the FinalDataTable on the DB.
The problem I am facing:
Using the current program I built, every row in the ExcelDataTable takes about 40 seconds to transform into 300 rows in the FinalDataTable. If I try to load all 4000 rows, it will take more than 40 hours to transform the entire datatable. I need to find a faster way to do this.
CodePudding user response:
As mentioned, there isn't much to go off of on this with what has been provided.
I'm sure there are more helpful fixes to consider but I wanted to put my two cents in about the For Loops.
I recommend switching the
For Each
statements with
For i as integer = 0 to ExcelDataTable.Rows.Count - 1
I've read that For Each is not as performance-friendly as it gathers each "row" as a collection, therefore increasing the overhead per loop.
Here is a SO post about this subject:
Major difference between 'for each' and 'for' loop in .NET
Not sure if that will make a difference for you but thought I would recommend it anyway.