Home > OS >  Javascript Regex multi-line base64
Javascript Regex multi-line base64

Time:11-21

I have the following from a MIME message;

--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

--------------ra650umTsDNeI5lwXmFy5luF--

I want to extract the base64 encoded message, regardless of how many lines it is.

The following will indeed find matches on each individual line, but how can I group them so that if there are multiple lines of base64 that matches, it will group them as "together"

var base64Regex = /^(?:[A-Za-z0-9 \/]{4})*(?:[A-Za-z0-9 \/]{4}|[A-Za-z0-9 \/]{3}=|[A-Za-z0-9 \/]{2}={2})$/gm

When the MIME content for example also contains a PGP signature, this would give me 4 or 5 matches, so I can't simply join them, because it will find that base64 as well.

Ideally I'd modify this so it gets everything from/including the first match to ---------- and says that is "match 1" and if it finds another block of base64, that is "match 2", etc.

Here is a link to regex101 showing 2 matches. In short, I would like for this to be one match.

https://regex101.com/r/32WjKa/1

CodePudding user response:

Would this help?

var base64Regex = /Content-Transfer-Encoding: base64([\s\S]*?)\s*?--/g;

Content-Transfer-Encoding: base64 - This is the start of the base64 encoded message.

[\s\S]*? - This is the base64 encoded message. It can be on multiple lines.

\s*? -- - This is the end of the base64 encoded message.

g - This is the global flag, so that it will match all instances of the regex

CodePudding user response:

Instead of looking for base64 characters, I'd look for all characters (including newlines) between the start and end of the HTTP payload.

By default, . in Javascript regexes, even in mulit-line mode, won't match linebreaks. But the /s flag allows for . to match linebreaks.

With this method, you can remove linebreaks after you match with a simple replace()

const str = `--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=

--------------ra650umTsDNeI5lwXmFy5luF--`

const payload = str.match(/base64\n\n(. )\n\n--------------. /ms)[1].replace(/\n/g, '')

You might also be better off using something like body-parser since HTTP payloads like this are standard.

  • Related