Home > Software design >  Golang multiline regexp parsing issue
Golang multiline regexp parsing issue

Time:10-19

I am creating a project in Go that parses Solidity code. In my project, I created a function analyzeFile() which for each smart contract (.sol) will detect statically issues with regexp:

func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
    findings := make(map[string][]Finding)
    readFile, err := os.Open(file)
    if err != nil {
        return nil, err
    }
    defer readFile.Close()
    contents, _ := ioutil.ReadFile(file)
    scanner := bufio.NewScanner(readFile)
    lineNumber := 0
    for scanner.Scan() {
        lineNumber  
        line := scanner.Text()
        for _, issue := range issues {
            if issue.ParsingMode == "SingleLine" {
                matched, _ := regexp.MatchString(issue.Pattern, line)
                if matched {
                    findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                        IssueIdentifier: issue.Identifier,
                        File:            file,
                        LineNumber:      lineNumber,
                        LineContent:     strings.TrimSpace(line),
                    })
                }
            }
        }
    }

When the regexes have to control the code on a single line, everything is fine. However, I also need to check things in the .sol files that occur on multiple lines, for instance detect this piece of code:

require(
  _disputeID < disputeCount &&
  disputes[_disputeID].status == Status.Active,
  "Disputes::!Resolvable"
);

I tried to add the following code in the analyzeFile() function:

 contents, _ := ioutil.ReadFile(file)
    for _, issue := range issues {
        if issue.ParsingMode == "MultiLine" {
            contents_to_string := string(contents)
            //s := strings.ReplaceAll(contents_to_string, "\n", " ")
            //sr := strings.ReplaceAll(s, "\r", " ")
            r := regexp.MustCompile(`((require)([(])\n.*[&&](?s)(.*?)([;]))`)
            finds := r.FindStringSubmatch(contents_to_string)
            for _, find := range finds {
                findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                    IssueIdentifier: issue.Identifier,
                    File:            file,
                    LineContent:     (find),
                })
            }
        }
    }

But I get wrong results because when transforming the source code to string, I get all the code on one line with line break \n character which makes any regex check crash.

CodePudding user response:

One word around solution could split the whole string with multiline with \n after caputer group (?s)require\((.*?)\);


func main() {
    var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
    var str = `require(
  _disputeID < disputeCount &&
  disputes[_disputeID].status == Status.Active,
  "Disputes::!Resolvable"
);`

    matches := re.FindAllStringSubmatch(str, -1)
    for _, match := range matches {
        lines := strings.Split(match[1], "\n")
        for _, line := range lines {
            fmt.Println(line)
        }
    }
}

https://go.dev/play/p/Omn5ULHun_-


In order to match multiple lines, the (?m)^[^\S\r\n]*(.*)[^\S\r\n](\S )$ could be used. We could do the multiline matching to the content between require( and )

func main() {
    var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
    var str = `require(
  _disputeID < disputeCount &&
  disputes[_disputeID].status == Status.Active,
  "Disputes::!Resolvable"
);`

    var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S )$`)
    matches := re.FindAllStringSubmatch(str, -1)
    for _, match := range matches {
        submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
        for _, submatch := range submathes {
            fmt.Println(submatch[0])
        }
    }
}

https://go.dev/play/p/LJsVy5vN6Ej

CodePudding user response:

By tinkering with the code I managed to get it to work:

    contents, _ := ioutil.ReadFile(file)
    for _, issue := range issues {
        if issue.ParsingMode == "MultiLineG015" {
            str := string(contents)
            var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
            //var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S )$`)
            //Getting all require in the sol file
            matches := re.FindAllStringSubmatch(str, -1)
            r := regexp.MustCompile(issue.Pattern)
            for _, match := range matches {
                submatches := r.FindAllStringSubmatch(match[0], -1)
                for _, submatch := range submatches {
                    findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                        IssueIdentifier: issue.Identifier,
                        File:            file,
                        LineContent:     ([]string{submatch[0]}),
                    })
                }
            }

This is the output:

2022-08-rigor\contracts\Community.sol::0 => [(
            _lendingNeeded >= _communityProject.totalLent &&
                _lendingNeeded <= IProject(_project).projectCost(),
            "Community::invalid lending"
        );]
2022-08-rigor\contracts\Disputes.sol::0 => [(
            _disputeID < disputeCount &&
                disputes[_disputeID].status == Status.Active,
            "Disputes::!Resolvable"
        );]
2022-08-rigor\contracts\Disputes.sol::0 => [(
            _actionType > 0 && _actionType <= uint8(ActionType.TaskPay),
            "Disputes::!ActionType"
        );]
2022-08-rigor\contracts\Project.sol::0 => [(
            _sender == builder || _sender == homeFi.communityContract(),
            "Project::!Builder&&!Community"
        );]

Thanks zangw for your help!

  • Related