Suppose you have the following multi-line string:
C1 10
C2 20
C3 30
C2 40
C4 50
C3 60
And you want to match only those lines which have the same leading word, so as to build the following result:
C1 10
C2 20 40
C3 30 60
C4 50
I am trying to figure out a solution with pure Regex, but I am stuck. Any help?
I did try what the regex that follows, but it didn't work...
Regex: /(^\w \b)(.*$)([\s\S]*?\n)(\1)(.*$)/gm
Substitution:$1$2$5$3
Result:
C1 10
C2 20 40
C3 30
C4 50
C3 60
As you can see, it only works with the first occurrence, despite the fact that I have used a lazy quantifier in the third capturing group.
Any help?
CodePudding user response:
You can use /(^\w \b)\s(.*$)/gm
to capture the needed groups then handle expected format using JavaScript.
let result = {};
let text = `C1 10
C2 20
C3 30
C2 40
C4 50
C3 60`;
Array.from(text.matchAll(/(^\w \b)\s(.*$)/gm)).forEach(([_, group, item]) => {
if (!result[group]) result[group] = [];
result[group].push(item);
});
Object.entries(result).map(([group, items]) => console.log(group, items.join(' ')));
CodePudding user response:
You could also accomplish this using reduce()
const data = `C1 10
C2 20
C3 30
C2 40
C4 50
C3 60`;
const result = data.split("\n").reduce((acc, val) => {
const vals = val.split(" ");
if (!acc[vals[0]]) acc[vals[0]] = vals[1];
else acc[vals[0]] = ` ${vals[1]}`;
return acc;
}, {});
console.log(result);