how to realize mismatch of regexp in golang?-CodePudding

This is a multiple choice question example. I want to get the chinese text like "英国、法国", "加拿大、墨西哥", "葡萄牙、加拿大", "墨西哥、德国" in the content of following code in golang, but it does not work.

package main

import (
    "fmt"
    "regexp"
    "testing"
)

func TestRegex(t *testing.T) {
    text := `（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国
`

    fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.(\S )?`).FindAllStringSubmatch(text, -1))
    fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.`).Split(text, -1))
}

text:

（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国

pattern: [A-E]\.(\S )?

Actual result: [["A.英国、法国B.加拿大、墨西哥" "英国、法国B.加拿大、墨西哥"] ["C.葡萄牙、加拿大D.墨西哥、德国" "葡萄牙、加拿大D.墨西哥、德国"]].

Expect result: [["A.英国、法国" "英国、法国"] ["B.加拿大、墨西哥" "加拿大、墨西哥"] ["C.葡萄牙、加拿大" "葡萄牙、加拿大"] ["D.墨西哥、德国" "墨西哥、德国"]]

I think it might be a greedy mode problem. Because in my code, it reads option A and option B as one option directly.

CodePudding user response：

Non-greedy matching won't solve this, you need positive lookahead, which re2 doesn't support.

As a workaround can just search on the labels and extract the text in between manually.

re := regexp.MustCompile(`[A-E]\.`)
res := re.FindAllStringIndex(text, -1)
results := make([][]string, len(res))
for i, m := range res {
    if i < len(res)-1 {
        results[i] = []string{text[m[0]:m[1]], text[m[1]:res[i 1][0]]}
    } else {
        results[i] = []string{text[m[0]:m[1]], text[m[1]:]}
    }
}

fmt.Printf("%q\n", results)

Should print

[["A." "英国、法国"] ["B." "加拿大、墨西哥\n"] ["C." "葡萄牙、加拿大"] ["D." "墨西哥、德国\n"]]