Javascript Regex multi-line base64-CodePudding

I have the following from a MIME message;

--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

--------------ra650umTsDNeI5lwXmFy5luF--

I want to extract the base64 encoded message, regardless of how many lines it is.

The following will indeed find matches on each individual line, but how can I group them so that if there are multiple lines of base64 that matches, it will group them as "together"

var base64Regex = /^(?:[A-Za-z0-9 \/]{4})*(?:[A-Za-z0-9 \/]{4}|[A-Za-z0-9 \/]{3}=|[A-Za-z0-9 \/]{2}={2})$/gm

When the MIME content for example also contains a PGP signature, this would give me 4 or 5 matches, so I can't simply join them, because it will find that base64 as well.

Ideally I'd modify this so it gets everything from/including the first match to ---------- and says that is "match 1" and if it finds another block of base64, that is "match 2", etc.

Here is a link to regex101 showing 2 matches. In short, I would like for this to be one match.

https://regex101.com/r/32WjKa/1

CodePudding user response：

Would this help?

var base64Regex = /Content-Transfer-Encoding: base64([\s\S]*?)\s*?--/g;

Content-Transfer-Encoding: base64 - This is the start of the base64 encoded message.

[\s\S]*? - This is the base64 encoded message. It can be on multiple lines.

\s*? -- - This is the end of the base64 encoded message.

g - This is the global flag, so that it will match all instances of the regex

CodePudding user response：

Instead of looking for base64 characters, I'd look for all characters (including newlines) between the start and end of the HTTP payload.

By default, . in Javascript regexes, even in mulit-line mode, won't match linebreaks. But the /s flag allows for . to match linebreaks.

With this method, you can remove linebreaks after you match with a simple replace()

const str = `--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

--------------ra650umTsDNeI5lwXmFy5luF--`

const payload = str.match(/base64\n\n(. )\n\n--------------. /ms)[1].replace(/\n/g, '')

You might also be better off using something like body-parser since HTTP payloads like this are standard.