i have a string like this DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB)
the expected i want like this
Dota (UMA), Mian (MIAN ISLAND), South Takak (PO) (JAB)
i was tried like this before
// make title first
_stringToTitle(str) {
return str.replace(
/\w\S*/g,
function (txt) {
return txt.charAt(0).toUpperCase() txt.substr(1).toLowerCase();
}
)
}
// get bracket
_getBracket(str) {
return str.match(/\(([^)] )\)/g);
}
// now change to title and construct them all
_changeWordToTitle(str) {
const triLC = this._getBracket(str).join(" ")
const toTitle = this._stringToTitle(str.replace(/\(.*\)/g, ''))
return `${toTitle}${triLC}`
}
but the result of my code seem not accurate like this
Dota (UMA) (MIAN ISLAND) (PO) (JAB)
is that possible only using one regex for this case?
CodePudding user response:
You can use
function _changeWordToTitle(str) {
return str.replace(/(\([^()]*\))|(\S)(\S*)/g, (_, group1, group2, group3) =>
group1 || group2.toUpperCase() group3.toLowerCase());
}
console.log( _changeWordToTitle('Dota (UMA), Mian (MIAN ISLAND), South Takak (PO) (JAB)') )
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
The regex is
/(\([^()]*\))|(\S)(\S*)/g
It matches any substring inside parentheses into Group 1, or it will match a non-whitespace (captured into Group 2) and then - into Group 3 - any zero or more whitespaces right after.
If Group 1 matches, it returns the match as is (no replacement takes place), else, the first non-whitespace (Group 2) is turned to upper and the rest (Group 3) are turned to lower.
See the regex demo.
CodePudding user response:
It depends how many assumptions you can do with the input.
const input = 'DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB)';
For example, if you are sure that no more than two words will ever be in parenthesis, then you could have a single regexp approach like this one:
// risky way (too many assumptions, fails with multi words in parens)
input.replace(
/([A-Z])([A-Z]*\s )/g,
($0, $1, $2, i, source) => (
// ignore matches that starts with parens
i > 0 && source[i -1] === '(' ?
$0 :
$1 $2.toLowerCase()
)
);
However, if parenthesis in parenthesis are never permitted, a surely more robust approach would be this one:
// better way (not the fastest, arguably the easiest)
input
// capitalize them all
.replace(/([A-Z])([A-Z]*)/g, (_, $1, $2) => $1 $2.toLowerCase())
// uppercase in parens
.replace(/\(([^)] ?)\)/g, (_, $1) => `(${$1.toUpperCase()})`)
;
The only caveat is that it uses 2 RegExp, but it also does a lot of extra work by capitalizing and the uppercasing back a lot of text.
Because from the input I assume the text in parenthesis will always be longer than the one before parenthesis, the previous solution might be not super fast or super efficient.
Edit: this one though, is from this answer, here for documentation sake, and it's one RegExp with parenthesis capturing as noop:
input.replace(
/(\([^)]*?\))|(\S)(\S*)/g,
(_, $1, $2, $3) => ($1 || ($2 $3.toLowerCase()))
);
Last, but not least, some ad-hoc parser that uses the minimum amount of heap and operations might be the best solution of them all.
// char-by-char way (how you'd do in C or without RegExp)
function adHoc(input) {
const output = [];
const {length} = input;
let i = 0;
let prev = 0;
let parens = false;
while (i < length) {
switch (true) {
case input[i] === '(':
case parens && input[i] === ')':
parens = !parens;
break;
case !parens && 'A' <= input[i] && input[i] <= 'Z':
const space = input.indexOf(' ', i);
output.push(
input.slice(prev, i 1),
input.slice(i 1, space).toLowerCase()
);
i = prev = space;
break;
}
i ;
}
output.push(input.slice(prev));
return output.join('');
}
adHoc(input);
This is also assuming that there will always be parenthesis after some capitalized word, hence a space before such parenthesis, but you could easily bail out if space
is negative, in case that assumption isn't true.
Above code could technically be ported to C, Rust, or C , so you can have your snappy utility if performance is critical and the input is, potentially, huge, but I am sure JS would do a great job there regardless.
CodePudding user response:
To avoid using complex regular expressions, you can create a small parser for it, something like (see comments within the snippet for more explanation):
console.log(parse(`DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB)`));
function parse(str) {
let result = ``;
// convert [str] to Array
const str2Parse = str.toLowerCase().split(``);
// toUpper is a recursive function adding
// uppercased characters to [result] while the
// character is not a closing parenthesis
const toUpper = (cc, rest) => {
result = cc.toUpperCase();
const next = rest.shift();
if (next === `)`) {
// done, add closing parenthesis and return
result = next;
return true;
}
// continue with the next character
return toUpper(next, rest);
};
// for every character of [str2Parse]
while (str2Parse.length) {
const c = str2Parse.shift();
// if c is an opening parenthesis, continue with toUpper
// it will add uppercased characters to [result]
if (c === `(`) {
toUpper(c, str2Parse);
continue;
}
result = c;
}
return result.replace(/\b./gi, c => c.toUpperCase());
}
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>