how to change uppercase between/in bracket with regex to title-CodePudding

i have a string like this DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB) the expected i want like this

Dota (UMA), Mian (MIAN ISLAND), South Takak (PO) (JAB)

i was tried like this before

// make title first
   _stringToTitle(str) {
        return str.replace(
            /\w\S*/g,
            function (txt) {
                return txt.charAt(0).toUpperCase()   txt.substr(1).toLowerCase();
            }
        )
    }
    
// get bracket
 _getBracket(str) {
        return str.match(/\(([^)] )\)/g);
    }

// now change to title and construct them all
_changeWordToTitle(str) {
        const triLC = this._getBracket(str).join(" ")
        const toTitle = this._stringToTitle(str.replace(/\(.*\)/g, ''))
        return `${toTitle}${triLC}`
    }

but the result of my code seem not accurate like this

Dota (UMA) (MIAN ISLAND) (PO) (JAB)

is that possible only using one regex for this case?

CodePudding user response：

You can use

function _changeWordToTitle(str) {
    return str.replace(/(\([^()]*\))|(\S)(\S*)/g, (_, group1, group2, group3) => 
        group1 || group2.toUpperCase()   group3.toLowerCase());
}
console.log( _changeWordToTitle('Dota (UMA), Mian (MIAN ISLAND), South Takak (PO) (JAB)') )

The regex is

/(\([^()]*\))|(\S)(\S*)/g

It matches any substring inside parentheses into Group 1, or it will match a non-whitespace (captured into Group 2) and then - into Group 3 - any zero or more whitespaces right after.

If Group 1 matches, it returns the match as is (no replacement takes place), else, the first non-whitespace (Group 2) is turned to upper and the rest (Group 3) are turned to lower.

See the regex demo.

CodePudding user response：

It depends how many assumptions you can do with the input.

const input = 'DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB)';

For example, if you are sure that no more than two words will ever be in parenthesis, then you could have a single regexp approach like this one:

// risky way (too many assumptions, fails with multi words in parens)
input.replace(
  /([A-Z])([A-Z]*\s )/g,
  ($0, $1, $2, i, source) => (
    // ignore matches that starts with parens
    i > 0 && source[i -1] === '(' ?
      $0 :
      $1   $2.toLowerCase()
  )
);

However, if parenthesis in parenthesis are never permitted, a surely more robust approach would be this one:

// better way (not the fastest, arguably the easiest)
input
  // capitalize them all
  .replace(/([A-Z])([A-Z]*)/g, (_, $1, $2) => $1   $2.toLowerCase())
  // uppercase in parens
  .replace(/\(([^)] ?)\)/g, (_, $1) => `(${$1.toUpperCase()})`)
;

The only caveat is that it uses 2 RegExp, but it also does a lot of extra work by capitalizing and the uppercasing back a lot of text.

Because from the input I assume the text in parenthesis will always be longer than the one before parenthesis, the previous solution might be not super fast or super efficient.

Edit: this one though, is from this answer, here for documentation sake, and it's one RegExp with parenthesis capturing as noop:

input.replace(
  /(\([^)]*?\))|(\S)(\S*)/g,
  (_, $1, $2, $3) => ($1 || ($2   $3.toLowerCase()))
);

Last, but not least, some ad-hoc parser that uses the minimum amount of heap and operations might be the best solution of them all.

// char-by-char way (how you'd do in C or without RegExp)
function adHoc(input) {
  const output = [];
  const {length} = input;
  let i = 0;
  let prev = 0;
  let parens = false;
  while (i < length) {
    switch (true) {
      case input[i] === '(':
      case parens && input[i] === ')':
        parens = !parens;
        break;
      case !parens && 'A' <= input[i] && input[i] <= 'Z':
        const space = input.indexOf(' ', i);
        output.push(
          input.slice(prev, i   1),
          input.slice(i   1, space).toLowerCase()
        );
        i = prev = space;
        break;
    }
    i  ;
  }
  output.push(input.slice(prev));
  return output.join('');
}

adHoc(input);

This is also assuming that there will always be parenthesis after some capitalized word, hence a space before such parenthesis, but you could easily bail out if space is negative, in case that assumption isn't true.

Above code could technically be ported to C, Rust, or C , so you can have your snappy utility if performance is critical and the input is, potentially, huge, but I am sure JS would do a great job there regardless.

CodePudding user response：

To avoid using complex regular expressions, you can create a small parser for it, something like (see comments within the snippet for more explanation):

console.log(parse(`DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB)`));

function parse(str) {
  let result = ``;
  // convert [str] to Array
  const str2Parse = str.toLowerCase().split(``);
  
  // toUpper is a recursive function adding
  // uppercased characters to [result] while the 
  // character is not a closing parenthesis
  const toUpper = (cc, rest) => {
    result  = cc.toUpperCase();
    const next = rest.shift();
    
    if (next === `)`) { 
      // done, add closing parenthesis and return
      result  = next; 
      return true;
    }
    // continue with the next character
    return toUpper(next, rest);
  };
  
  // for every character of [str2Parse]
  while (str2Parse.length) {
    const c = str2Parse.shift();
    
    // if c is an opening parenthesis, continue with toUpper
    // it will add uppercased characters to [result]
    if (c === `(`) {
      toUpper(c, str2Parse);
      continue;
    } 
   
    result  = c;
  }
  
  return result.replace(/\b./gi, c => c.toUpperCase());
}