I have the following from a MIME message;
--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64
TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=
--------------ra650umTsDNeI5lwXmFy5luF--
I want to extract the base64 encoded message, regardless of how many lines it is.
The following will indeed find matches on each individual line, but how can I group them so that if there are multiple lines of base64 that matches, it will group them as "together"
var base64Regex = /^(?:[A-Za-z0-9 \/]{4})*(?:[A-Za-z0-9 \/]{4}|[A-Za-z0-9 \/]{3}=|[A-Za-z0-9 \/]{2}={2})$/gm
When the MIME content for example also contains a PGP signature, this would give me 4 or 5 matches, so I can't simply join them, because it will find that base64 as well.
Ideally I'd modify this so it gets everything from/including the first match to ----------
and says that is "match 1" and if it finds another block of base64, that is "match 2", etc.
Here is a link to regex101 showing 2 matches. In short, I would like for this to be one match.
https://regex101.com/r/32WjKa/1
CodePudding user response:
Would this help?
var base64Regex = /Content-Transfer-Encoding: base64([\s\S]*?)\s*?--/g;
Content-Transfer-Encoding: base64
- This is the start of the base64 encoded message.
[\s\S]*?
- This is the base64 encoded message. It can be on multiple lines.
\s*? --
- This is the end of the base64 encoded message.
g
- This is the global flag, so that it will match all instances of the regex
CodePudding user response:
Instead of looking for base64 characters, I'd look for all characters (including newlines) between the start and end of the HTTP payload.
By default, .
in Javascript regexes, even in mulit-line mode, won't match linebreaks. But the /s
flag allows for .
to match linebreaks.
With this method, you can remove linebreaks after you match with a simple replace()
const str = `--------------ra650umTsDNeI5lwXmFy5luF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64
TG9yZW0gSXBzdW0NCg0KSGVyZSBpcyBzb21lIG1vcmUgdGV4dA0KDQpOb3cgb24gYSAzcmQg
bGluZQ0KDQoNClRoYW5rcw0KDQo=
--------------ra650umTsDNeI5lwXmFy5luF--`
const payload = str.match(/base64\n\n(. )\n\n--------------. /ms)[1].replace(/\n/g, '')
You might also be better off using something like body-parser since HTTP payloads like this are standard.