Home > Mobile >  Index out of bounds error using Regex Split
Index out of bounds error using Regex Split

Time:10-14

Posting another question here since last time I did the people who answered were extremely helpful. Bear in mind, I'm relatively new to VB.net.

So I'm working on a program that pulls the first and third columns out of a text file using Regex.Split to eliminate the multiple spaces between the alphanumeric characters in the file.

A high level example of what the text file looks like is here:

VARIABLE1                MEAS1           STORAGE1
VARIABLE2                MEAS2           STORAGE2
VARIABLE3                MEAS3           STORAGE3
VARIABLE4                MEAS4           STORAGE4
VARIABLE5                MEAS5           STORAGE5
VARIABLE6                MEAS6           STORAGE6
                                           
#VARIABLE7         MEAS7           STORAGE7
VARIABLE8              MEAS8           STORAGE8
VARIABLE9            MEAS9           STORAGE9
VARIABLE10            MEAS10           STORAGE10
VARIABLE11            MEAS11           STORAGE11
VARIABLE12            MEAS12           STORAGE12
VARIABLE13            MEAS13           STORAGE13
VARIABLE14            MEAS14           STORAGE14

The file uses the "#" to denote comments in the file, so in my code I tell the System.IO to ignore that character. However, when creating a test function to try this, I continuously get an Index out of bounds error, (only on some files. Some in this format work fine, for some reason) When looking through the execution output, I am receiving the error after it writes the "STORAGE6" line, so there has to be an error traversing from STORAGE6 to VARIABLE7, and I can't quite figure it out. Any insight on this would be extremely appreciated!

The test function I have written is below:

    Public Function Testing()
        OpenFileDialog1.ShowDialog()
        Dim file = System.IO.File.ReadAllLines(OpenFileDialog1.FileName)
        For Each line In file
            Dim arrWords() As String = System.Text.RegularExpressions.Regex.Split(line, "\s ")
            Dim upBound = arrWords.GetUpperBound(0)
            If upBound <> 0 Then

                If line.Contains("#") Or line.Length = 0 Then

                Else
                    Console.WriteLine(arrWords(0)   " "   arrWords(2))

                End If


            End If
        Next
    End Function

I get the out of bounds error when calling "arrWords(2)," which I'm sure was pretty obvious, but just trying to make the question as detailed as possible.

CodePudding user response:

The simple fix is changing these two lines:

If upBound <> 0 Then
    If line.Contains("#") Or line.Length = 0 Then

like this:

If upBound > 0 Then
    If line.TrimStart().StartsWith("#") OrElse String.IsNullOrWhitespace(line) Then

But I'd really do something more like this:

Public Class DataItem
    Public Property Variable As String
    Public Property Measure As String
    Public Property Storage As String
End Class

Public Function ReadDataFile(fileName As String) As IEnumerable(Of DataItem)
     Return File.ReadLines(fileName).
               Where(Function(line) Not line.TrimStart().StartsWith("#") AndAlso Not String.IsNullorWhitespace(line)).
               Select(Function(line) System.Text.RegularExpressions.Regex.Split(line, "\s ")).
               Where(Function(fields) fields.Length = 3).
               Select(Function(fields) 
                    Return New DataItem With {
                     .Variable = fields(0), 
                     .Measure = fields(1),
                     .Storage = fields(2)}
               End Function)
End Function

Public Function Testing()
    If OpenFileDialog1.ShowDialog() = DialogResult.OK Then
        Dim records = ReadDataFile(OpenFileDialog1.FileName)
        For Each record in records
            Console.WriteLine($"{record.Variable} {record.Storage}") 
        Next
    End If
End Function
  • Related