Regex to capture trailing asterisks but not wrapping asterisks-CodePudding

I'm writing a Regex to replace *s at the end of a word with superscript numbers representing the count of those asterisks, as well as asterisks at the beginning of a line followed by a space and then a word. That is, I'm making it easy to write footnotes on my phone. Write the thing → send the thing to an iOS Shortcut → Regex magic → the thing has footnote markers.

However, since I regularly use *foo bar* to denote emphasis, I don't want to capture those asterisks.

I thought I had it with this regex:

/**
 * (?<=\S)                  -- make sure the thing behind the capture is a not-space
 * (?<!\W\*\w([^*]|\w\*)*?) -- make sure the thing behind the capture is not a not-word character
 *                             followed by an asterisk
 *                             followed by anything that isn't an asterisk
 *                             followed by a letter followed by an asterisk
 *                             e.g. Hello *world*.
 * \*                       -- 1  asterisks.  The primary capture for trailing asterisks.
 * (?=[^\w*]|$)             -- make sure the thing following the capture is a not-word-not-asterisk,
 *                             and may be the end of the line
 * |                        -- OR
 * ^\* (?=\s\S)             -- the start of a line followed by 1  asterisks (the primary capture)
 *                             followed by a space
 *                             followed by a not-space
 */
const regex = /(?<=\S)(?<!\W\*\w([^*]|\w\*)*?)\* (?=[^\w*]|$)|^\* (?=\s\S)/gm;

const transform = m => {
  const superTable = [
    '⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'
  ];

  let str = [];

  // for each digit, add the character for the 1s place then divide by ten
  for (let len = m.length; len; len = (len - len % 10) / 10) {
    str.unshift(superTable[len % 10]);
  }

  return str.join('');
}

/** [input, expectedOutput] */
const testCases = [
  [`A b*** c`, `A b³ c`],
  [`A *b* c*`, `A *b* c¹`],
  [`A *b* *c* d*`, `A *b* *c* d¹`],
  [`A *b* c* d**`, `A *b* c¹ d²`],
  [`** a b c`, `² a b c`],
  [`** a b*** c`, `² a b³ c`],
  [`A *bc* d**`, `A *bc* d²`],
  [`A *b c* d**`, `A *b c* d²`],
];

const results = ['Input\t\t=>\tActual\t\t===\tExpected\t: Success'];
results.push('='.repeat(73));

for (const [input, expected] of testCases) {
  const actual = input.replace(regex, transform);
  const extraSpacing = actual.length < 8 ? '\t' : '';
  const success = actual === expected;
  results.push(`${input}\t=>\t${actual}${extraSpacing}\t===\t${expected}${extraSpacing}\t: ${success}`);
}

console.log(results.join('\n'));

The first six were the test cases I used when I first wrote the script. The last two I discovered today. It turns out it works fine for *a* (single characters wrapped in asterisks) but not for *ab* or *a b* (2 characters wrapped in asterisks).

I can't for the life of me figure out what I've done wrong, though admittedly I wrote this regex weeks ago. I suspect it has to do with either greediness or laziness, but I'm not sure where.

CodePudding user response：

You can use

/^\* (?=\s \S)|(?<!\s)(?<!\*[^*\s]*)\* (?![\w*])/gm

See the regex demo. Details:

^ - start of a line
\* (?=\s \S) - one or more asterisks followed with one or more whitespaces and then a non-whitespace char
| - or
(?<!\s) - immediately on the left, there can be no whitespace char (if you worked with word chars, \w, you could use \b here)
(?<!\*[^*\s]*) - immediately on the left, there can be no * and then zero or more chars other than asterisks and whitespaces
\* - one or more asterisks
(?![\w*]) - immediately on the right, there can be no word and * chars.

Here is your updated JavaScript demo:

const regex = /^\* (?=\s \S)|(?<!\s)(?<!\*[^*\s]*)\* (?![\w*])/gm;

const transform = m => {
  const superTable = [
    '⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'
  ];

  let str = [];

  // for each digit, add the character for the 1s place then divide by ten
  for (let len = m.length; len; len = (len - len % 10) / 10) {
    str.unshift(superTable[len % 10]);
  }

  return str.join('');
}

/** [input, expectedOutput] */
const testCases = [
  [`A b*** c`, `A b³ c`],
  [`A *b* c*`, `A *b* c¹`],
  [`A *b* *c* d*`, `A *b* *c* d¹`],
  [`A *b* c* d**`, `A *b* c¹ d²`],
  [`** a b c`, `² a b c`],
  [`** a b*** c`, `² a b³ c`],
  [`A *bc* d**`, `A *bc* d²`],
];

const results = ['Input\t\t=>\tActual\t\t===\tExpected\t: Success'];
results.push('='.repeat(73));

for (const [input, expected] of testCases) {
  const actual = input.replace(regex, transform);
  const extraSpacing = actual.length < 8 ? '\t' : '';
  const success = actual === expected;
  results.push(`${input}\t=>\t${actual}${extraSpacing}\t===\t${expected}${extraSpacing}\t: ${success}`);
}

console.log(results.join('\n'));