Home > other >  How to parse email addresses from a long string in Golang
How to parse email addresses from a long string in Golang

Time:06-03

How can I extract only email addresses from a long string in Golang? For example:

"a bunch of irrelevant text fjewiwofjfjvnvkdlslsosiejwoqlwpwpwo
 [email protected],ou=f,c=US
 [email protected],ou=f,c=US
 [email protected],ou=f,c=US
 [email protected],ou=f,c=US
 [email protected],ou=people,ou=f,c=US
 [email protected],ou=f,c=US"

This would return a list of all the emails: [[email protected], [email protected], etc...]

Each email address would begin with "mail=" and end with a comma ",".

CodePudding user response:

For this you need to breakdown the long go string into parts that you need. You can do filtration and searching using Regular Expressions to match the email pattern you see above.

Here's a piece of code using Regular Expressions to first obtain the section with "mail=" then further format the email removing the trailing ,

 import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    var re = regexp.MustCompile(`(?m)mail=[A-Za-z.@0-9] \,`)
    var str = `a bunch of irrelevant text fjewiwofjfjvnvkdlslsosiejwoqlwpwpwo
 [email protected],ou=f,c=US
 [email protected],ou=f,c=US
 [email protected],ou=f,c=US
 [email protected],ou=f,c=US
 [email protected],ou=people,ou=f,c=US
 [email protected],ou=f,c=US`

    for i, match := range re.FindAllString(str, -1) {
        fmt.Println(match, "found at index", i)
        email := strings.Split(match, "=")[1]

        email = strings.ReplaceAll(email, ",", "")

        fmt.Print(email)
    }
}

CodePudding user response:

while i agree with the comment from user datenwolf here is another version which does not involve regular expressions.

It also handle more complex emails format including comma within the local parts. Something uneasy to implement using regexp.

see https://stackoverflow.com/a/2049510/11892070


import (
    "bufio"
    "fmt"
    "strings"
)

var str = `a bunch of irrelevant text fjewiwofjfjvnvkdlslsosiejwoqlwpwpwo
[email protected],ou=f,c=US
[email protected],ou=f,c=US
[email protected],ou=f,c=US
[email protected],ou=f,c=US
[email protected],ou=people,ou=f,c=US
[email protected],ou=f,c=US
mail=(comented)[email protected],ou=f,c=US
mail="(with comma inside)arnold,[email protected]",ou=f,c=US
[email protected]`

func main() {

    var emails []string

    sc := bufio.NewScanner(strings.NewReader(str))

    for sc.Scan() {
        t := sc.Text()
        if !strings.HasPrefix(t, "mail=") {
            continue
        }
        t = t[5:]

        // Lookup for the next comma after the @.
        at := strings.Index(t, "@")
        comma := strings.Index(t[at:], ",")
        if comma < 0 {
            email := strings.TrimSpace(t)
            emails = append(emails, email)
            continue
        }
        comma  = at
        email := strings.TrimSpace(t[:comma])
        emails = append(emails, email)
    }

    for _, e := range emails {
        fmt.Println(e)
    }

}
  •  Tags:  
  • go
  • Related