Home > front end >  how to change uppercase between/in bracket with regex to title
how to change uppercase between/in bracket with regex to title

Time:12-02

i have a string like this DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB) the expected i want like this

Dota (UMA), Mian (MIAN ISLAND), South Takak (PO) (JAB)

i was tried like this before

// make title first
   _stringToTitle(str) {
        return str.replace(
            /\w\S*/g,
            function (txt) {
                return txt.charAt(0).toUpperCase()   txt.substr(1).toLowerCase();
            }
        )
    }
    
// get bracket
 _getBracket(str) {
        return str.match(/\(([^)] )\)/g);
    }

// now change to title and construct them all
_changeWordToTitle(str) {
        const triLC = this._getBracket(str).join(" ")
        const toTitle = this._stringToTitle(str.replace(/\(.*\)/g, ''))
        return `${toTitle}${triLC}`
    }

but the result of my code seem not accurate like this

Dota (UMA) (MIAN ISLAND) (PO) (JAB)

is that possible only using one regex for this case?

CodePudding user response:

You can use

function _changeWordToTitle(str) {
    return str.replace(/(\([^()]*\))|(\S)(\S*)/g, (_, group1, group2, group3) => 
        group1 || group2.toUpperCase()   group3.toLowerCase());
}
console.log( _changeWordToTitle('Dota (UMA), Mian (MIAN ISLAND), South Takak (PO) (JAB)') )
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

The regex is

/(\([^()]*\))|(\S)(\S*)/g

It matches any substring inside parentheses into Group 1, or it will match a non-whitespace (captured into Group 2) and then - into Group 3 - any zero or more whitespaces right after.

If Group 1 matches, it returns the match as is (no replacement takes place), else, the first non-whitespace (Group 2) is turned to upper and the rest (Group 3) are turned to lower.

See the regex demo.

CodePudding user response:

It depends how many assumptions you can do with the input.

const input = 'DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB)';

For example, if you are sure that no more than two words will ever be in parenthesis, then you could have a single regexp approach like this one:

// risky way (too many assumptions, fails with multi words in parens)
input.replace(
  /([A-Z])([A-Z]*\s )/g,
  ($0, $1, $2, i, source) => (
    // ignore matches that starts with parens
    i > 0 && source[i -1] === '(' ?
      $0 :
      $1   $2.toLowerCase()
  )
);

However, if parenthesis in parenthesis are never permitted, a surely more robust approach would be this one:

// better way (not the fastest, arguably the easiest)
input
  // capitalize them all
  .replace(/([A-Z])([A-Z]*)/g, (_, $1, $2) => $1   $2.toLowerCase())
  // uppercase in parens
  .replace(/\(([^)] ?)\)/g, (_, $1) => `(${$1.toUpperCase()})`)
;

The only caveat is that it uses 2 RegExp, but it also does a lot of extra work by capitalizing and the uppercasing back a lot of text.

Because from the input I assume the text in parenthesis will always be longer than the one before parenthesis, the previous solution might be not super fast or super efficient.

Edit: this one though, is from this answer, here for documentation sake, and it's one RegExp with parenthesis capturing as noop:

input.replace(
  /(\([^)]*?\))|(\S)(\S*)/g,
  (_, $1, $2, $3) => ($1 || ($2   $3.toLowerCase()))
);

Last, but not least, some ad-hoc parser that uses the minimum amount of heap and operations might be the best solution of them all.

// char-by-char way (how you'd do in C or without RegExp)
function adHoc(input) {
  const output = [];
  const {length} = input;
  let i = 0;
  let prev = 0;
  let parens = false;
  while (i < length) {
    switch (true) {
      case input[i] === '(':
      case parens && input[i] === ')':
        parens = !parens;
        break;
      case !parens && 'A' <= input[i] && input[i] <= 'Z':
        const space = input.indexOf(' ', i);
        output.push(
          input.slice(prev, i   1),
          input.slice(i   1, space).toLowerCase()
        );
        i = prev = space;
        break;
    }
    i  ;
  }
  output.push(input.slice(prev));
  return output.join('');
}

adHoc(input);

This is also assuming that there will always be parenthesis after some capitalized word, hence a space before such parenthesis, but you could easily bail out if space is negative, in case that assumption isn't true.

Above code could technically be ported to C, Rust, or C , so you can have your snappy utility if performance is critical and the input is, potentially, huge, but I am sure JS would do a great job there regardless.

CodePudding user response:

To avoid using complex regular expressions, you can create a small parser for it, something like (see comments within the snippet for more explanation):

console.log(parse(`DOTA (UMA), MIAN (MIAN ISLAND), SOUTH TAKAK (PO) (JAB)`));

function parse(str) {
  let result = ``;
  // convert [str] to Array
  const str2Parse = str.toLowerCase().split(``);
  
  // toUpper is a recursive function adding
  // uppercased characters to [result] while the 
  // character is not a closing parenthesis
  const toUpper = (cc, rest) => {
    result  = cc.toUpperCase();
    const next = rest.shift();
    
    if (next === `)`) { 
      // done, add closing parenthesis and return
      result  = next; 
      return true;
    }
    // continue with the next character
    return toUpper(next, rest);
  };
  
  // for every character of [str2Parse]
  while (str2Parse.length) {
    const c = str2Parse.shift();
    
    // if c is an opening parenthesis, continue with toUpper
    // it will add uppercased characters to [result]
    if (c === `(`) {
      toUpper(c, str2Parse);
      continue;
    } 
   
    result  = c;
  }
  
  return result.replace(/\b./gi, c => c.toUpperCase());
}
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

  • Related