Home > front end >  Split Arabic name characters and store the results in an array
Split Arabic name characters and store the results in an array

Time:02-04

I have an Arabic name and I am trying to split the Arabic characters and at the same time to identify the character if First, Middle or Last Here's my try but I couldn't fix the code

Sub Split_Arabic_Name()
    Dim sName As String, i As Long
    sName = "حاتم علاء خميس سيد"
    Dim result() As String
    Dim index As Integer
    index = 0
    For i = 1 To Len(sName)
        If AscW(Mid(sName, i, 1)) >= &HD800 And AscW(Mid(sName, i, 1)) <= &HDBFF Then
            ReDim Preserve result(index)
            result(index) = Mid(sName, i, 2)
            index = index   1
            i = i   1
        Else
            If Mid(sName, i, 1) <> " " Then
                ReDim Preserve result(index)
                result(index) = Mid(sName, i, 1)
                index = index   1
            End If
        End If
    Next i
    Dim arrName() As String
    ReDim arrName(0 To UBound(result), 0 To 1)
    Dim first As Integer, middle As Integer, last As Integer
    first = 0
    middle = 0
    last = 0
    For i = 0 To UBound(result)
        arrName(i, 0) = result(i)
        If (i = 0) Or (Mid(sName, i   1, 1) = " ") Then
            arrName(i, 1) = "First"
        ElseIf Mid(sName, i   1, 1) = " " Then
            arrName(i, 1) = "Last"
        Else
            arrName(i, 1) = "Middle"
        End If
    Next i
    Range("H1").Resize(UBound(arrName, 1)   1, UBound(arrName, 2)   1).Value = arrName
End Sub

The spaces are considered as marks. While the name starts with ح so this is [First] -- then ا then ت [these are Middle] --- then م followed be space so this is [Last]

the second name after space which is ع , should be [First] -- then the characters ل then ا [Middle] --- and the character ء [Last] as it is before the space .. and so on

as for the last name سيد , the [First] is س but ي [Middle] --and the last character in the whole name not followed be space but it is the last character so it is [Last]

This is a snapshot of the results I got and I typed the remarks enter image description here

CodePudding user response:

As per my comment, a regular expression might not be a bad idea here:

Sub Test()

Dim s As String: s = [A1]
Dim x As Long: x = 1
Dim tst As String

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "(?:^|\s)(\S)|(\S)(?!\s|$)|(\S)(?=\s|$)"
    Set matches = .Execute(s)
    If Not matches Is Nothing Then
        For Each Match In matches
            x = x   1
            tst = Match.Submatches(0) & Match.Submatches(1) & Match.Submatches(2)
            Select Case tst
                Case Match.Submatches(0)
                    Cells(x, 2).Value = "First"
                Case Match.Submatches(1)
                    Cells(x, 2).Value = "Middle"
                Case Match.Submatches(2)
                    Cells(x, 2).Value = "Last"
            End Select
            Cells(x, 1).Value = tst
        Next
    End If
End With

End Sub

enter image description here

The idea behind the pattern (?:^|\s)(\S)|(\S)(?!\s|$)|(\S)(?=\s|$) is to catch every character other than whitespace as a seperate match in their own respective group. The regex engine does recognize that it needs to read the input right to left. To break this pattern down:

  • (?:^|\s)(\S) - A single non-whitespace character that is preceded by the start of the input or a whitespace character;
  • (\S)(?!\s|$) - A non-whitespace character not followed by a whitespace character nor the end-line. This does catch the correct characters apart from the first character because of the order or the alternations in the pattern;
  • (\S)(?=\s|$) - A non-whitespace character that is followed by a whitespace character nor the end-line. This does catch the correct characters apart from the first character because of the order or the alternations in the pattern.

So each match in group 1 is a 'First', each match in group 2 is 'Middle' and each match in group 3 is 'Last'.

See an online demo

  • Related