I have a task to parse both eml and msg formatted email files using Go. There's a wonderful package for parsing EML files, however, with MSG, no matter what package I research and attempt to implement, I encounter the same error every single time.
malformed MIME header: missing colon:
It isn't the msg file itself. I have the same service in .NET which reads the msg file perfectly (MsgReader library).
Could someone suggest a package I could use in Go to read msg files? I wonder if it's an encoding issue (this wasn't a problem with eml files).
I've tried using these packages:
- github.com/veqryn/go-email
- net/mail
- https://github.com/go-gomail/gomail
- github.com/jordan-wright/email
- github.com/emersion/go-message
- github.com/jpoehls/gophermail
As an example, here is one function I've tried to read an msg file.
func parse_msg_file() {
var filePath string = "c://messages//kraken.msg"
var reader io.Reader
f, err := os.Open(filePath)
checkerr(err, "file " filePath " not found or can not be readed")
defer f.Close()
reader = bufio.NewReader(f)
msg, err := email.ParseMessage(reader)
checkerr(err, "failed to parse raw msg file")
if msg == nil {
checkerr(err, "failed to parse raw msg file")
}
}
and the output when I call the function is:
malformed MIME header: missing colon: "\xd0\xcf\x11\u0871\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00>\x00\x03\x00\xfe\xff\t\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\t\x00\x00\x00\x02\x00\x00\x00\xfe\xff\xff\xff\x00\x00\x00\x00\x03\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xffR\x00o\x00o\x00t\x00 \x00E\x00n\x00t\x00r\x00y\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x05\x00\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0\t-0r$\xd9\x01"
exit status 255
CodePudding user response:
Just to add to my comment, I have searched for "msg parsers in go" in Google and it has brought up this repository - https://github.com/oucema001/OutlookMessageParser-Go . I don't know if it actually works - it's pretty old, and no documentation, so unlikely it'll be easy to use, but you can start from there.
CodePudding user response:
Here's the specification for Microsoft's Outlook Item File Format (
*.msg
).And here's the specification for Microsoft's Compound File Binary File Format, the basis for the Outlook Item File Format (
*.msg
).
The Compound File Binary File Format is
a general-purpose file format that provides a file-system-like structure within a file for the storage of arbitrary, application-specific streams of data.
I believe that this stuff all came from Microsoft's old OLE/COM stuff (Object Linking and Embedding/Component Object Model).
FWIW, here's a parser for the Compound File Binary File Format. No idea if it works, or anything else about it, but it might be, at least, a jumping-off point for you.
https://github.com/richardlehane/mscfb
[Edited to note]
Seems that the above package is a dependency of https://github.com/oucema001/OutlookMessageParser-Go, referenced in this answer by @astax.