Home > Enterprise >  Grouping lines with same leading word using Javascript Regex Engine
Grouping lines with same leading word using Javascript Regex Engine

Time:06-27

Suppose you have the following multi-line string:

C1 10
C2 20
C3 30
C2 40
C4 50
C3 60

And you want to match only those lines which have the same leading word, so as to build the following result:

C1 10
C2 20 40
C3 30 60
C4 50

I am trying to figure out a solution with pure Regex, but I am stuck. Any help?

I did try what the regex that follows, but it didn't work...

Regex: /(^\w \b)(.*$)([\s\S]*?\n)(\1)(.*$)/gm

Substitution:$1$2$5$3

Result:

C1 10
C2 20 40
C3 30

C4 50
C3 60

As you can see, it only works with the first occurrence, despite the fact that I have used a lazy quantifier in the third capturing group.

Any help?

CodePudding user response:

You can use /(^\w \b)\s(.*$)/gm to capture the needed groups then handle expected format using JavaScript.

let result = {};
let text = `C1 10
C2 20
C3 30
C2 40
C4 50
C3 60`;

Array.from(text.matchAll(/(^\w \b)\s(.*$)/gm)).forEach(([_, group, item]) => {
    if (!result[group]) result[group] = [];
    result[group].push(item);
});

Object.entries(result).map(([group, items]) => console.log(group, items.join(' ')));

CodePudding user response:

You could also accomplish this using reduce()

const data = `C1 10
C2 20
C3 30
C2 40
C4 50
C3 60`;

const result = data.split("\n").reduce((acc, val) => {
  const vals = val.split(" ");
  if (!acc[vals[0]]) acc[vals[0]] = vals[1];
  else acc[vals[0]]  = ` ${vals[1]}`;
  return acc;
}, {});

console.log(result);

  • Related