Below is code to read data from .CSV file which has 3 columns (keeps varying). Sometimes when user opens the .csv in Excel and deletes a row in it, the row gets read as an empty or whitespace row with comma separation.
When same is read through code I get ",," as input which gets added to my dataTable as an empty row. How can I escape this blank row?
Dim sreader As StreamReader
Dim sstring As String
Dim dt As DataTable
Dim counter as Integer
sreader = File.OpenText(Path.ToString) 'this path is path of the excel
While sreader.Peek <> -1
sstring = sreader.Readline()
If sstring <> " " then ' how can I check here that the string does not have any content in it except for the seperating commas
Dim str As String () = sstring.Split(",")
Dim rowdt As DataRow
rowdt = dt.NewRow()
For i As Integer = 0 To dt.Columns.count-1
rowdt(i) = str(i).ToString()
Next
dt.rows.Add(rowdt)
End if
Counter = counter 1
End While
I tried something. Have posted in Answer section
This is what I tried
Dim sreader As StreamReader
Dim counter as Integer
Dim sstring As String
Dim dt As DataTable
sreader = File.OpenText(Path.ToString)
While sreader.Peek <> -1
sstring = sreader.Readline()
Dim no as integer = 0
For each str as String in sstring.Split(",")
If str.ToString.Trim = "" then
no = no 1
End If
Next
If no <> 3 then
Dim str As String () = sstring.Split(",")
Dim rowdt As DataRow
rowdt = dt.NewRow()
For i As Integer = 0 To dt.Columns.count-1
rowdt(i) = str(i).ToString()
Next
dt.rows.Add(rowdt)
End if
End if
counter = counter 1
End While
CodePudding user response:
Read all the lines, then process only lines where there is any value between commas
Dim path = "filename.txt"
Dim dt As New DataTable()
dt.Columns.AddRange(
{
New DataColumn("Column1"), New DataColumn("Column2"),
New DataColumn("Column3"), New DataColumn("Column4"),
New DataColumn("Column5"), New DataColumn("Column6"),
New DataColumn("Column7"), New DataColumn("Column8"),
New DataColumn("Column9"), New DataColumn("Column10")
})
Dim sw As New Stopwatch()
sw.Start()
Dim lines = File.ReadAllLines(Path)
For Each line In lines
Dim split = line.Split({","c}, StringSplitOptions.None)
If split.Any(Function(s) Not String.IsNullOrWhiteSpace(s)) Then
Dim row = dt.NewRow()
For i As Integer = 0 To dt.Columns.Count - 1
row(i) = split(i).ToString()
Next
dt.Rows.Add(row)
End If
Next
sw.Stop()
Console.WriteLine($"Took {sw.ElapsedMilliseconds} ms")
Console.WriteLine($"Read {dt.Rows.Count()} rows")
Tested to address performance concerns
File contents 1024 lines of a,b,c,d,e,f,g,h,i,j
and some lines of ,,,,,,,,,
mixed in, including the final line of the file
Final 10 lines of the file:
a,b,c,d,e,f,g,h,i,j
a,b,c,d,e,f,g,h,i,j
,,,,,,,,,
,,,,,,,,,
a,b,c,d,e,f,g,h,i,j
a,b,c,d,e,f,g,h,i,j
,,,,,,,,,
a,b,c,d,e,f,g,h,i,j
a,b,c,d,e,f,g,h,i,j
,,,,,,,,,
A StopWatch object shows it takes 2ms to read all lines. And there are exactly 1024 lines of data in the resulting DataTable. The processor skips rows with no values
Took 2 ms
Read 1024 rows
CodePudding user response:
I have tried splitting and checking the string. Hope it works.
Dim sreader As StreamReader
Dim counter as Integer
Dim sstring As String
Dim dt As DataTable
sreader = File.OpenText(Path.ToString)
While sreader.Peek <> -1
sstring = sreader.Readline()
Dim no as integer = 0
For each str as String in sstring.Split(",")
If str.ToString.Trim = "" then
no = no 1
End If
Next
If no <> 3 then
Dim str As String () = sstring.Split(",")
Dim rowdt As DataRow
rowdt = dt.NewRow()
For i As Integer = 0 To dt.Columns.count-1
rowdt(i) = str(i).ToString()
Next
dt.rows.Add(rowdt)
End if
End if
counter = counter 1
End While