Home > Mobile >  Java. Extracting character from array that isn't ASCII
Java. Extracting character from array that isn't ASCII

Time:11-09

I'm trying to extract a certain character from a buffer that isn't ASCII. I'm reading in a file that contains movie names that have some non ASCII character sprinkled in it like so.

1|Tóy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Gét Shorty (1995)

I was able to pick off the lines that contained the non ASCII characters, but I'm trying to figure out how to get that particular character from the lines that have said non ASCII character and replace it with an ACSII character from the map I've made.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) {

        HashMap<Character, Character>Char_Map = new HashMap<>();
        Char_Map.put('o','ó');
        Char_Map.put('e','é');
        Char_Map.put('i','ï');

        for(Map.Entry<Character,Character> entry: Char_Map.entrySet())
        {
            System.out.println(entry.getKey()   " -> "  entry.getValue());
        }

        try
        {
            BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
            String contentLine= br.readLine();


            while(contentLine != null)
            {
                String[] contents = contentLine.split("\\|");
                boolean result = contents[1].matches("\\A\\p{ASCII}*\\z");

                if(!result)
                {
                    System.out.println(contentLine);

                    

                    //System.out.println();
                }

                contentLine= br.readLine();

            }
        }
        catch (IOException ioe)
        {
            System.out.println("Cannot open file as it doesn't exist");
        }
    }
}

I tried using something along the lines of:

if((contentLine.charAt(i) == something

But I'm not sure.

CodePudding user response:

You can just use replaceAll. Put this in the while loop, so that it works on each line you read from the file. With this change, you won't need the split and if (... matches) anymore.

contentLine.replaceAll("ó", "o");
contentLine.replaceAll("é", "e");
contentLine.replaceAll("ï", "i");

If you want to keep a map, just iterate over its keys and replace with the values you want to map to:

Map<String, String> map = new HashMap<>();
map.put("ó", "o");
// ... and all the others

Later, in your loop reading the contents, you replace all the characters:

for (Map.Entry<String, String> entry : map.entrySet())
{
    String oldChar = entry.getKey();
    String newChar = entry.getValue();
    contentLine = contentLine.replaceAll(oldChar, newChar);
}

Here is a complete example:

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) throws Exception {
        HashMap<String, String> nonAsciiToAscii = new HashMap<>();
        nonAsciiToAscii.put("ó", "o");
        nonAsciiToAscii.put("é", "e");
        nonAsciiToAscii.put("ï", "i");

        BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
        String contentLine = br.readLine();
        while (contentLine != null)
        {
            for (Map.Entry<String, String> entry : nonAsciiToAscii.entrySet())
            {
                String oldChar = entry.getKey();
                String newChar = entry.getValue();
                contentLine = contentLine.replaceAll(oldChar, newChar);
            }

            System.out.println(contentLine); // or whatever else you want to do with the cleaned lines

            contentLine = br.readLine();
        }
    }
}

This prints:

robert:~$ javac Main.java && java Main
1|Toy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Get Shorty (1995)
robert:~$

CodePudding user response:

You want to flip your keys and values:

Map<Character, Character> charMap = new HashMap<>();
charMap.put('ó','o');
charMap.put('é','e');
charMap.put('ï','i');

and then get the mapped character:

char mappedChar = charMap.getOrDefault(inputChar, inputChar);

To get the chars for a string, call String#toCharArray()

  •  Tags:  
  • java
  • Related