Home > Blockchain >  REGEXREPLACE all characters until at least 4 consecutive capital letters
REGEXREPLACE all characters until at least 4 consecutive capital letters

Time:10-24

In a very long string, I'd like to remove everything that starts with SPAIN: until we get at least 4 capital letters in a row ending with a : (ie. INDIA: or SOUTH AFRICA: or NORTHERN-IRELAND: ).

My attempt where A2 is the very long string:

=REGEXREPLACE(A2,"SPAIN: .*([A-ZÀ-ÿ\-\' ]{4,}): ","$1: ")

Unfortunately the above doesn't work as it's erasing everything until the very last 4 capital letters in the string.

Any ideas?

Sheet example to play with here: enter image description here

CodePudding user response:

I have added a new sheet ("Erik Help") to your sample spreadsheet. The following formula is in B2 of that sheet:

=ArrayFormula(IF(A2:A="",,REGEXREPLACE(REGEXREPLACE(A2:A,IF(LEFT(A2:A,4)="ESPA","^. ("&REGEXEXTRACT(A2:A,"-:-([A-ZÀ-Ý]{4}[^:] )")&")","ESPAGNE[^~] -:-([A-ZÀ-Ý]{4}[^:] )"),"$1"),"ESPA. $","")))

This retrieves the correct result for all of the cases in your sample.

  • Related