I'm writing a Regex to replace *
s at the end of a word with superscript numbers representing the count of those asterisks, as well as asterisks at the beginning of a line followed by a space and then a word. That is, I'm making it easy to write footnotes on my phone. Write the thing → send the thing to an iOS Shortcut → Regex magic → the thing has footnote markers.
However, since I regularly use *foo bar*
to denote emphasis, I don't want to capture those asterisks.
I thought I had it with this regex:
/**
* (?<=\S) -- make sure the thing behind the capture is a not-space
* (?<!\W\*\w([^*]|\w\*)*?) -- make sure the thing behind the capture is not a not-word character
* followed by an asterisk
* followed by anything that isn't an asterisk
* followed by a letter followed by an asterisk
* e.g. Hello *world*.
* \* -- 1 asterisks. The primary capture for trailing asterisks.
* (?=[^\w*]|$) -- make sure the thing following the capture is a not-word-not-asterisk,
* and may be the end of the line
* | -- OR
* ^\* (?=\s\S) -- the start of a line followed by 1 asterisks (the primary capture)
* followed by a space
* followed by a not-space
*/
const regex = /(?<=\S)(?<!\W\*\w([^*]|\w\*)*?)\* (?=[^\w*]|$)|^\* (?=\s\S)/gm;
const transform = m => {
const superTable = [
'⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'
];
let str = [];
// for each digit, add the character for the 1s place then divide by ten
for (let len = m.length; len; len = (len - len % 10) / 10) {
str.unshift(superTable[len % 10]);
}
return str.join('');
}
/** [input, expectedOutput] */
const testCases = [
[`A b*** c`, `A b³ c`],
[`A *b* c*`, `A *b* c¹`],
[`A *b* *c* d*`, `A *b* *c* d¹`],
[`A *b* c* d**`, `A *b* c¹ d²`],
[`** a b c`, `² a b c`],
[`** a b*** c`, `² a b³ c`],
[`A *bc* d**`, `A *bc* d²`],
[`A *b c* d**`, `A *b c* d²`],
];
const results = ['Input\t\t=>\tActual\t\t===\tExpected\t: Success'];
results.push('='.repeat(73));
for (const [input, expected] of testCases) {
const actual = input.replace(regex, transform);
const extraSpacing = actual.length < 8 ? '\t' : '';
const success = actual === expected;
results.push(`${input}\t=>\t${actual}${extraSpacing}\t===\t${expected}${extraSpacing}\t: ${success}`);
}
console.log(results.join('\n'));
The first six were the test cases I used when I first wrote the script. The last two I discovered today. It turns out it works fine for *a*
(single characters wrapped in asterisks) but not for *ab*
or *a b*
(2 characters wrapped in asterisks).
I can't for the life of me figure out what I've done wrong, though admittedly I wrote this regex weeks ago. I suspect it has to do with either greediness or laziness, but I'm not sure where.
CodePudding user response:
You can use
/^\* (?=\s \S)|(?<!\s)(?<!\*[^*\s]*)\* (?![\w*])/gm
See the regex demo. Details:
^
- start of a line\* (?=\s \S)
- one or more asterisks followed with one or more whitespaces and then a non-whitespace char|
- or(?<!\s)
- immediately on the left, there can be no whitespace char (if you worked with word chars,\w
, you could use\b
here)(?<!\*[^*\s]*)
- immediately on the left, there can be no*
and then zero or more chars other than asterisks and whitespaces\*
- one or more asterisks(?![\w*])
- immediately on the right, there can be no word and*
chars.
Here is your updated JavaScript demo:
const regex = /^\* (?=\s \S)|(?<!\s)(?<!\*[^*\s]*)\* (?![\w*])/gm;
const transform = m => {
const superTable = [
'⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'
];
let str = [];
// for each digit, add the character for the 1s place then divide by ten
for (let len = m.length; len; len = (len - len % 10) / 10) {
str.unshift(superTable[len % 10]);
}
return str.join('');
}
/** [input, expectedOutput] */
const testCases = [
[`A b*** c`, `A b³ c`],
[`A *b* c*`, `A *b* c¹`],
[`A *b* *c* d*`, `A *b* *c* d¹`],
[`A *b* c* d**`, `A *b* c¹ d²`],
[`** a b c`, `² a b c`],
[`** a b*** c`, `² a b³ c`],
[`A *bc* d**`, `A *bc* d²`],
];
const results = ['Input\t\t=>\tActual\t\t===\tExpected\t: Success'];
results.push('='.repeat(73));
for (const [input, expected] of testCases) {
const actual = input.replace(regex, transform);
const extraSpacing = actual.length < 8 ? '\t' : '';
const success = actual === expected;
results.push(`${input}\t=>\t${actual}${extraSpacing}\t===\t${expected}${extraSpacing}\t: ${success}`);
}
console.log(results.join('\n'));