Home > Software engineering >  Splitting a multiline string by regex ha unwanted elements in result array
Splitting a multiline string by regex ha unwanted elements in result array

Time:12-05

I'm trying to split a multiline string using a regex.

const regexCommitSha = RegExp(/([0-9a-f]{7})\s/m)
const result = commits.split(regexCommitSha)
console.log(result)

This is my multiline string (commits):

1234567 fix: simple bug fix
apps/backend/src/lib/file.ts
1234567 fix: second bug fix
apps/backend/src/lib/file.ts
apps/frontend/src/lib/file.ts
1234567 feat: new feature
apps/frontend/src/lib/file.ts
1234567 feat: second feature
apps/frontend/src/lib/file.ts

And this is my result:

[
  '',
  '1234567',
  'fix: simple bug fix\napps/backend/src/lib/file.ts\n',
  '1234567',
  'fix: second bug fix\n'  
    'apps/backend/src/lib/file.ts\n'  
    'apps/frontend/src/lib/file.ts\n',
  '1234567',
  'feat: new feature\napps/frontend/src/lib/file.ts\n',
  '1234567',
  'feat: second feature\napps/frontend/src/lib/file.ts'
]

Why do I have the empty string as first element and why do I have '1234567'-elements in my result array? As this is my splitter, I thought this is not existing in the result.

I would expect

[
  'fix: simple bug fix\napps/backend/src/lib/file.ts\n',
  'fix: second bug fix\n'  
    'apps/backend/src/lib/file.ts\n'  
    'apps/frontend/src/lib/file.ts\n',
  'feat: new feature\napps/frontend/src/lib/file.ts\n',
  'feat: second feature\napps/frontend/src/lib/file.ts'
]

What am I doing wrong?

CodePudding user response:

A capture group in a regular expression with .split will include the text captured in the resulting array - since you don't want the numbers to be included, don't put them in a capture group. (The group isn't accomplishing anything useful anyway.) You also need to exclude the initial empty string (because the empty string comes between the start of the string and the first split upon match) - which would have to be done by filtering the array afterwards.

const input = `1234567 fix: simple bug fix
apps/backend/src/lib/file.ts
1234567 fix: second bug fix
apps/backend/src/lib/file.ts
apps/frontend/src/lib/file.ts
1234567 feat: new feature
apps/frontend/src/lib/file.ts
1234567 feat: second feature
apps/frontend/src/lib/file.ts`;
const result = input.split(/[\da-f]{7}\s/m).filter(Boolean);
console.log(result)
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

If you want to come up with an array of strings that match a pattern, in my experience, .match is usually a bit more predictable and reliable; lookbehind for digits, then match characters and lines that don't start with digits.

const input = `1234567 fix: simple bug fix
apps/backend/src/lib/file.ts
1234567 fix: second bug fix
apps/backend/src/lib/file.ts
apps/frontend/src/lib/file.ts
1234567 feat: new feature
apps/frontend/src/lib/file.ts
1234567 feat: second feature
apps/frontend/src/lib/file.ts`;
const matches = input.match(/(?<=\d{7} ). ?(?=\n\d{7}|$)/gs);
console.log(matches);
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

You can replace the regex with

// No grouping in the regex
const regexCommitSha = RegExp(/[0-9a-f]{7}\s/m)

This fixes the issue where you're getting 1234567s even after splitting on these values. This is how .split behaves with capturing groups (for more details, see this: javascript regex split produces too many items)

As for why you're getting empty string in the result set, that's how JavaScript split works. Take a look at the example,

s = "a1bbbc"
s.split("a1")
// ["", "bbcd"]

CodePudding user response:

You don't have to do it all in one operation.

  1. Split by \n
  2. Remove any leading numbers
  3. Filter out any empty string
  4. Group the lines as you please

const input = `
1234567 fix: simple bug fix
apps/backend/src/lib/file.ts
1234567 fix: second bug fix
apps/backend/src/lib/file.ts
apps/frontend/src/lib/file.ts
1234567 feat: new feature
apps/frontend/src/lib/file.ts
1234567 feat: second feature
apps/frontend/src/lib/file.ts
`;

const lines = input
  .split('\n') // break into lines
  .map(line => line.replace(/^\d /, '').trim()) // remove leading numbers
  .filter(line => line.length > 0); // remove empty strings

const output = lines
  .reduce((groups, line) => {
    if (/^\w :/.test(line)) {
      groups.push(line);
    }
    else {
      groups[groups.length-1]  = '\n'   line;
    }
    return groups;
  }, []);

console.log(output);
<iframe name="sif3" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

  • Related