Home > Mobile >  How to remove special characters from Java String
How to remove special characters from Java String

Time:04-24

I'm writing code that loops through a list of folder uri's and check to make sure the folder actually exists on the file system. Folders that don't exist, the code outputs onto a webpage. The issue is that, some of the folders output on the webpage actually exist. When I copy the uri and paste it into File Explorer it locates the folder. So I put a breakpoint in the code and grabbed that same uri from the code and pasted it into the File Explorer and it couldn't find the folder.

URI from Webpage Output

\mycpu\go now\Harden, James  Jr. & Allen\2021

URI from Debugged Code

\mycpu\go now\Harden, James Jr. & Allen\2021

They're exactly the same. But then I tried pasting each one into chrome to see what I get, and here's what I got:

URI from Webpage Output

file://mycpu//go now//Harden, James Jr. & Allen//2021/

URI from Debugged Code

file://mycpu//go now//Harden, James  Jr. & Allen//2021/

So the URI from the debugged code contained some non-breaking characters after the name James. I know I can easily replace those with a String replace but how do I encode each uri before checking to see if the folder exists to prevent these special characters from occurring in the folder uri's.

CodePudding user response:

There may be a quicker solution that didn't come to my head immediately but the first thing that comes to my mind is creating an array of the special characters, and then looping through the array to remove every one of them, like such:

String[] specialChars = {"!","@","#","$","%","^","&","*","(","}"};
for (String special : specialChars) {
    original.repalceAll(special, ""); // original is your original string
}

The reason the array is of strings and not chars is that using chars would require you to replace special characters with a space, which may or may not be what you want.

CodePudding user response:

So, after further examination, the only issue in the uri's were the no break spaces ( ) so I just needed to remove those from the uri's. Thanks to @andrewjames for linking to a questions that showed how to expose the special characters which was a critical part of the solution. Here's the solution I came up with.

So, first I encoded the URI into UTF-8 which exposed the special characters. Then I ran a String replace on the encoded URI to replace the substring of no break characters with a plus sign (which is a space in UTF-8) and then decoded the encoded URI back into the standard user-friendly format that we're all used to seeing. This did the trick! Thank you all for your help! Each comment illuminated my path a bit more (see code below).

private String cleanString(String uri){
        try {
            String encodedUri = java.net.URLEncoder.encode(uri, StandardCharsets.UTF_8.name());
            return URLDecoder.decode(encodedUri.replace(" ", " "), StandardCharsets.UTF_8.name());
        } catch (UnsupportedEncodingException ex) {
            Logger.getLogger(ParseCSVFileImpl.class.getName()).log(Level.SEVERE, null, ex);
            return uri;
        }
    }
  • Related