Home > Software engineering >  Filtering from csv files using Java stream
Filtering from csv files using Java stream

Time:04-27

I have a csv file with characters from SW and would like to find the heaviest character using java stream. Here's a sample of the file:

name;height;mass;hair_color;skin_color;eye_color;birth_year;gender
Luke Skywalker;172;77;blond;fair;blue;19BBY;male
C-3PO;167;75;n/a;gold;yellow;112BBY;n/a
R2-D2;96;32;n/a;white, blue;red;33BBY;n/a
Darth Vader;202;136;none;white;yellow;41.9BBY;male
Leia Organa;150;49;brown;light;brown;19BBY;female
Owen Lars;178;120;brown, grey;light;blue;52BBY;male
Beru Whitesun lars;165;75;brown;light;blue;47BBY;female
Grievous;216;159;none;brown, white;green, yellow;unknown;male
Finn;unknown;unknown;black;dark;dark;unknown;male
Rey;unknown;unknown;brown;light;hazel;unknown;female
Poe Dameron;unknown;unknown;brown;light;brown;unknown;male

Expected output is String "Grievous".

Initially I thought of creating a Character class, where I could store the data and work with objects instead of String array after splitting the line. However, each value can be unknown or n/a, so not too sure how to work around it. Is there a way to achieve this using stream only?

CodePudding user response:

I would not recommend doing this with Streams, but instead with some CSV library, as it is way more safe.


public static void main(String[] args) {
    try {
        BufferedReader reader = new BufferedReader(new FileReader(new File("characters.csv")));

        // Skip first line
        reader.readLine();

        Optional<String> optionalHeaviestCharacter = getHeaviestCharactersName(reader.lines());

        System.out.println(optionalHeaviestCharacter);

    } catch (IOException e) {
        e.printStackTrace();
    }
}

public static Optional<String> getHeaviestCharactersName(Stream<String> lineStream) {
    return lineStream
            .map(lineString -> lineString.split(";")) // map every line string to an array with all values
            .filter(values -> values[2].matches("[0-9] ")) // filter out characters with a non-number value as a mass
            .max((values1, values2) -> Integer.compare(Integer.parseInt(values1[2]), Integer.parseInt(values2[2]))) // get element with maximum mass
            .map(heaviestValues -> heaviestValues[0]); // map values array of heaviest character to its name
}

First we read the file, which I have names characters.csv. You will probably need to edit the filepath to point to your file.

BufferedReader reader = new BufferedReader(new FileReader(new File("characters.csv")));

Then we read all lines from the file, each line as a String in the Stream<String>, by calling the reader.lines() method

The function getHeaviestCharactersName will then return an Optional<String>. The Optional will be empty, when for example all characters have an unknown/invalid mass or when there are no characters present at all.

If you think that there will always be at least one character with a valid mass present, you get just get the name of the heaviest character with optionalHeaviestCharacter.get(). Else you would have to check if the Optional is empty first:

if (optionalHeaviestCharacter.isEmpty()) {
    System.out.println("Could not find a character with the heaviest mass");
} else {
    System.out.println("Heaviest character is "   optionalHeaviestCharacter.get());
}

You can just get the name by calling

CodePudding user response:

As others noted, I doubt streams is the best approach to your particular problem. But since you asked, just for fun, I gave it a try. After much web-searching, and much trial-and-error, I seem to have found a solution using streams.

We use NIO.2 classes Path & Files to open the data file.

We define a stream by calling Files.lines.

We omit the header row by calling Stream#skip.

Some of your input rows have non-numeric value "unknown" in our target third field. So we call Stream#filter to ignore those lines. We extract the third field by using String#split while passing the annoying zero-based index number 2.

To get the highest number in our third column, we need to sort. To sort, we extract the third field in a Comparator created via Comparator.comparingInt. To get the needed int value, we parse the text of the third field using Integer.parseInt.

After sorting, we need to access the last element in the stream, as that should have our character with the greatest weight. This seems clumsy to me, but apparently the way to get the last element of a stream is .reduce( ( first , second ) -> second ).orElse( null ). I sure wish we had a Stream#last method!

That last element is a String object, a line of text from your input file. So we need to yet again split the string. But this time when we split, we take the first element rather than the third, as our goal is to report the character’s name. The first element is identified by the annoying zero-based index number of 0.

Voilà, we get Grievous as our final result.

Path path = Paths.get( "/Users/basil_dot_work/inputs.csv" );
if ( Files.notExists( path ) ) { throw new IllegalStateException( "Failed to find file at path: "   path ); }

Stream < String > lines;
try { lines = Files.lines( path , StandardCharsets.UTF_8 ); } catch ( IOException e ) { throw new RuntimeException( e ); }
String result =
        lines
                .skip( 1L )  // Skip the header row, with column names.
                .filter(  // Filter out lines whose targeted value is "unknown". We need text made up only of digits.
                        line -> ! line.split( ";" )[ 2 ].equalsIgnoreCase( "unknown" )
                )
                .sorted(  // Sort by extracting third field’s text, then parse to get an `int` value.
                        Comparator.comparingInt( ( String line ) -> Integer.parseInt( line.split( ";" )[ 2 ] ) )
                )
                .reduce( ( first , second ) -> second ).orElse( null ) // Get last element.
                .split( ";" )[ 0 ]; // Extract name of character from first field of our one and only line of input left remaining after processing.

System.out.println( "result = "   result );

result = Grievous

Be sure to compare my approach here with that of the other Answer, by Florian Hartung. It may well be better; I've not yet studied carefully.

  • Related