Home > other >  How to collect data from a stream in different lists based on a condition?
How to collect data from a stream in different lists based on a condition?

Time:07-12

I have a stream of data as shown below and I wish to collect the data based on a condition.

Stream of data:

452857;0;L100;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L120;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L121;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L126;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L100;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L122;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;

I wish to collect the data based on the index = 2 (L100,L121 ...) and store it in different lists of L120,L121,L122 etc using Java 8 streams. Any suggestions? Note: splittedLine array below is my stream of data.

For instance: I have tried the following but I think there's a shorter way:

List<String> L100_ENTITY_NAMES = Arrays.asList("L100", "L120", "L121", "L122", "L126");


 List<List<String>> list=  L100_ENTITY_NAMES.stream()
                            .map(entity -> Arrays.stream(splittedLine)
                                    .filter(line -> {
                                        String[] values =  line.split(String.valueOf(DELIMITER));
                                        if(values.length > 0){
                                            return entity.equals(values[2]);
                                        }
                                        else{
                                            return false;
                                        }
                                    }).collect(Collectors.toList())).collect(Collectors.toList());

CodePudding user response:

I'd rather change the order and also collect the data into a Map<String, List<String>> where the key would be the entity name.

Assuming splittedLine is the array of lines, I'd probably do something like this:

Set<String> L100_ENTITY_NAMES = Set.of("L100", ...);
String delimiter = String.valueOf(DELIMITER);

Map<String, List<String>> result = 
  Arrays.stream(splittedLine)   
      .map(line -> {
        String[] values =  line.split(delimiter );
        if( values.length < 3) {
          return null;
        }

        return new AbstractMap.SimpleEntry<>(values[2], line);
      })
     .filter(Objects::nonNull)
     .filter(tempLine -> L100_ENTITY_NAMES.contains(tempLine.getEntityName()))
     .collect(Collectors.groupingBy(Map.Entry::getKey,
                Collectors.mapping(Map.Entry::getValue, Collectors.toList());

Note that this isn't necessarily shorter but has a couple of other advantages:

  • It's not O(n*m) but rather O(n * log(m)), so it should be faster for non-trivial stream sizes
  • You get an entity name for each list rather than having to rely on the indices in both lists
  • It's easier to understand because you use distinct steps:
    • split and map the line
    • filter null values, i.e. lines that aren't valid in the first place
    • filter lines that don't have any of the L100 entity names
    • collect the filtered lines by entity name so you can easily access the sub lists

CodePudding user response:

You're effectively asking for what languages like Scala provide on collections: groupBy. In Scala you could write:

splitLines.groupBy(_(2)) // Map[String, List[String]]

Of course, you want this in Java, and in my opinion, not using streams here makes sense due to Java's lack of a fold or groupBy function.

HashMap<String, ArrayList<String>> map = new HashMap<>();
for (String[] line : splitLines) {
    if (line.length < 2) continue;
    ArrayList<String> xs = map.getOrDefault(line[2], new ArrayList<>());
    xs.addAll(Arrays.asList(line));
    map.put(line[2], xs);
}

As you can see, it's very easy to understand, and actually shorter than the stream based solution.

I'm leveraging two key methods on a HashMap.

  • The first is getOrDefault; basically if the value associate with our key doesn't exist, we can provide a default. In our case, an empty ArrayList.

  • The second is put, which actually acts like a putOrReplace because it lets us override the previous value associated with the key.

I hope that was helpful. :)

CodePudding user response:

you're asking for a shorter way to achieve the same, actually your code is good. I guess the only part that makes it look lengthy is the if/else check in the stream.

    if (values.length > 0) {
        return entity.equals(values[2]);
    } else {
        return false;
    }

I would suggest introduce two tiny private methods to improve the readability, like this:

    List<List<String>> list = L100_ENTITY_NAMES.stream()
    .map(entity -> getLinesByEntity(splittedLine, entity)).collect(Collectors.toList());

    private List<String> getLinesByEntity(String[] splittedLine, String entity) {
        return Arrays.stream(splittedLine).filter(line -> isLineMatched(entity, line)).collect(Collectors.toList());
    }

    private boolean isLineMatched(String entity, String line) {
        String[] values = line.split(DELIMITER);
        return values.length > 0 && entity.equals(values[2]);
    }

CodePudding user response:

I would convert the semicolon-delimited lines to objects as soon as possible, instead of keeping them around as a serialized bunch of data.

  1. First, I would create a model modelling our data:

    public record LBasedEntity(long id, int zero, String lcode, …) { }
    
  2. Then, create a method to parse the line. This can be as well an external parsing library, for this looks like CSV with semicolon as delimiter.

    private static LBasedEntity parse(String line) {
        String[] parts = line.split(";");
        if (parts.length < 3) {
            return null;
        }
    
        long id = Long.parseLong(parts[0]);
        int zero = Integer.parseInt(parts[1]);
        String lcode = parts[2];
        …
        return new LBasedEntity(id, zero, lcode, …);
    }
    
  3. Then the mapping is trivial:

    Map<String, List<LBasedEntity>> result = Arrays.stream(lines)
        .map(line -> parse(line))
        .filter(Objects::nonNull)
        .filter(lBasedEntity -> L100_ENTITY_NAMES.contains(lBasedEntity.lcode()))
        .collect(Collectors.groupingBy(LBasedEntity::lcode));
    
    • map(line -> parse(line)) parses the line into an LBasedEntity object (or whatever you call it);
    • filter(Objects::nonNull) filters out all null values produced by the parse method;
    • The next filter selects all entities of which the lcode property is contained in the L100_ENTITY_NAMES list (I would turn this into a Set, to speed things up);
    • Then a Map is with key-value pairs of L100_ENTITY_NAMEList<LBasedEntity>.
  • Related