Home > Software design >  Replace tab with a space inside a String
Replace tab with a space inside a String

Time:06-10

I can't find a solution to this very simple problem: I would like to replace a tab in a String with a whitespace (and only one space). For example, I have a String like this:

        Hello        World!        
New line"

And I would like to get this as a result:

Hello World! 
New line

For this, I used this function : myStr.replaceAll("\\s ", " ");

The tabs are well removed... But also the carriage return: Hello World! New line

I also tried to use replaceAll with "[\\t ]" as replacement characters but if I replace with a whitespace, it does not change anything..

I must be missing a simple solution but I don't see...

CodePudding user response:

You need to use the following to match multiple contiguous tabs and spaces.

[\\t ] 

For regular expressions it's always a good idea to test them out using a tool like https://regexr.com/

There you can enter your sample and the regular expression and it even explains what's going on.

CodePudding user response:

Whitespace:

is any character or series of characters that represent horizontal or vertical space in typography. When

Issue

In you example the \s matched and replaced all of the following:

  • regular space like (horizontal)
  • tab like \t (horizontal)
  • carriage-return like \r (vertical)
  • new-line or line-feed like \n (vertical)

See this substitution demo for Java's regex-flavor.

Alternative Solutions

In Java you could easily condense this horizontal whitespace with:

(1) Split by lines and clean each line separately

See the demo on IdeOne:

String multiLineText = "\tHello        World!"   "\n"        
    "New line";

String lineSeparatorRegex = "\r?\n"; // usually "\n" on Linux/Mac or "\r\n" on Windows

List<String> condensedLines = new ArrayList();
String[] lines = multiLineText.split(System.lineSeparator()); // alternative: use the regex 
for (String line : lines) {
  condensedLines.add(line.replaceAll("\\s ", " ")); // condense
}
String condensedPerLine = String.join(System.lineSeparator(), condensedLines);

Note: System.getProperty("line.separator") is the old way before System.lineSeparator() was introduced in Java 1.7

(2) Simple multi-line capable regex

as answered by Niko:

// remove all tabs or additional space characters 
String condensedPerLine = multiLineText.replaceAll("[\t ] ", " ");

See on Regex101: demo preserving lines.

(3) Use Apache StringUtils with streaming:

StringUtils class is perfect for handling Strings null-safe, for this case normalizeWhitespace(s). Note there in JavaDocs also the hint:

Java's regexp pattern \s defines whitespace as [ \t\n\x0B\f\r]

// clean all superfluous whitespace and control-characters from lines
String condensedPerLine = Arrays.stream(multiLineText.split(System.lineSeparator())
    .map( s -> return StringUtils.normalizeWhitespace(s))
    .collect(Collectors.joining(System.lineSepartor()));

See also

  •  Tags:  
  • java
  • Related