Home > Mobile >  Regex for complex uppercase-lowercase scenarios
Regex for complex uppercase-lowercase scenarios

Time:01-01

I'm working on an app that adapts text to braille specifications and it has some tricky rules on how to handle uppercase, I'd like some help. The rules are:

  1. Before a single uppercase letter, add ":"

:This is an :Example

  1. Before multiple uppercase letters and all caps words add another ":"

:This is ::ANOTHER ex::AMple, ::ALRIGHT

  1. If a sequence of uppercase words is made of more than three uppercase words in a row, add "-" to the beggining of the sequence and delete all other "::" within that sequence, except for the last one

:This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example

  1. Finally, if it goes from uppercase to lower case mid word (except when first capitalized letters), add ";"

:This is my fin:A;l ::EXAM;ple

Working with regex, I was able to solve for the simple ones but not all rules.

// adds : before any uppercase
   var firstChange = text.replace(/[A-Z] /g,':$&'); 

// adds : to double  uppercase    
   var secondChange = firstChange.replace(/[([A-Z]{2,}/g,':$&'); 

// adds ; to upper-lower change
   var thirdChange = secondChange.replace(/\B[A-Z] (?=[a-z]/g,'$&;')    

I was trying to build up from simple to complex, then I tried the other way around, then I tried merging some rules, either way they conflict. I'm new to regex and I could use any insight on how to solve this. Thank you.

Edit: To make it more clear, I made a final example that combines all rules.

This is an Example. This is ANOTHER exAmple, ALRIGHT? This is A VERY LONG SENTENCE WITH A SEQUENCE OF ALL CAPS to serve AS AN Example. This is my finAl EXAMple.

Should become:

:This is an :Example. :This is ::ANOTHER ex::AM;ple, ::ALRIGHT? :This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example. :This is my fin:A;l ::EXAM;ple


SOLVED: With the help of @ChrisMaurer and @SaSkY, here is the code to solve the above problem:

var original = document.getElementById("area1");
var another = document.getElementById("area2");

function MyFunction(area1) {

  // include : before every uppercase
  var firstChange = original.value.replace(/[A-Z] /g, ':$&');

  // add one more : before multiple uppercase letters
  var secondChange = firstChange.replace(/([([A-Z]{2,}|\b[|A-Z] \b)/g, ':$&');

  // add - to beggining of long uppercase sequence
  var thirdChange = secondChange.replace(/\B(::[A-Z] (\s ::[A-Z] ){3,})/g, '-$&');

  // removes extra :: before words within long uppercase sequence
  var fourthChange = thirdChange.replace(/(?<=::[A-Z] \s*)::([A-Z] )(?=\s*::[A-Z] )/g, '$1');

  // add a lowercase symbol when it changes from uppercase to lowercase mid word
  var fifthChange = fourthChange.replace(/\B[A-Z](?=[a-z])/g, '$&;');

  // update
  area2.value = fifthChange;
}
<html>
<body>
<textarea id="area1"  rows="4" cols="40" onkeyup="MyFunction()">
</textarea>
<textarea id="area2" rows="4" cols="40"></textarea>
</body>
</html>

CodePudding user response:

So I think your approach is good, and the first replace seems to get the single colons into the right place. The second one screws up on single letter words like A and I. I would fix that with an added alternation:

/([([A-Z]{2,}|\b[A-Z] \b)/g

Now you need to add two more replacements; one to add the hyphen, and the other to remove the double colons.

For the hyphen you just search for three or more ::ALLCAPS whitespace combos like this:

/\B(::[A-Z] (\s ::[A-Z] ){2,})/g

The \B handles caps at the very beginning of the string. I replaced with hyphen and $1.

To remove the double colons, I got a little trickier with a lookbehind and a lookahead:

/(?<=::[A-Z] \s*)::([A-Z] )(?=\s*::[A-Z] )/g

This one is just replaced with $1. Luckily Javascript supports variable length lookbehinds.

Here it is working on Regex101: enter image description here

I did not look at your last replacement. Superficially it seemed to be OK.

  • Related