Home > Back-end >  Using regular expression, how to remove matching sequence at the beginning and ending of the text bu
Using regular expression, how to remove matching sequence at the beginning and ending of the text bu

Time:12-21

my problem is very simple but I can't figure out the correct regular expression I should use.

I have the following variable (Java) :

String text = "\033[1mYO\033[0m"; // this is ANSI for bold text in the Terminal

My goal is to remove the ANSI codes with a single regular expression (I just want to keep the plain text at the middle). I cannot modify the text in any way and those ANSI codes will always be at the same place (so one at the beginning, one at the end, though sometimes it's possible that there is none).

With this regular expression, I will remove them using replaceAll method :

String plainText = text.replaceAll(unknownRegex, "");

Any idea on what the unknown regex could be?

CodePudding user response:

Found the answer thanks to a comment that disappeared for mysterious reasons.

Actually, i just need to make a group to get what's in the middle of the string and using it ($1) to replace the whole thing :

String plainText = text.replaceAll("\\033\\[.*m(. )\\033\\[.*m", "$1")

Not sure if this will remove every ANSI codes but that is enough for what I want to do.

CodePudding user response:

Well, you use a single regex that has the ansi codes optionally at the beginning and end, captures anything in between and replaces the entire string with the value of the group: text.replaceAll("^(?:\\\\\\d \\[1m)?(.*?)(?:\\\\\\d \\[0m)?$", "$1"). (this might not capture every ansi code - adjust if needed).

Breaking the expression down (note that the example above escapes backslashes for Java strings so they are doubled):

  • ^ is the start of the string
  • (?:\\\d \[1m)? matches an optional \<at least 1 digit>[1m
  • (.*?) matches any text but as little as possible, and captures it into group 1
  • (?:\\\d \[0m)? atches an optional \<at least 1 digit>[0m
  • $ is the end of the input

In the replacement $1 refers to the value of capturing group 1 which is (.*?) in the expression.

  • Related