I have a stream of data as shown below and I wish to collect the data based on a condition.
Stream of data:
452857;0;L100;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L120;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L121;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L126;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L100;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
452857;0;L122;csO;20220411;20220411;EUR;000101435; ; ;F;1;EUR;000100000; ;
I wish to collect the data based on the index = 2 (L100,L121 ...) and store it in different lists of L120,L121,L122 etc using Java 8 streams. Any suggestions? Note: splittedLine array below is my stream of data.
For instance: I have tried the following but I think there's a shorter way:
List<String> L100_ENTITY_NAMES = Arrays.asList("L100", "L120", "L121", "L122", "L126");
List<List<String>> list= L100_ENTITY_NAMES.stream()
.map(entity -> Arrays.stream(splittedLine)
.filter(line -> {
String[] values = line.split(String.valueOf(DELIMITER));
if(values.length > 0){
return entity.equals(values[2]);
}
else{
return false;
}
}).collect(Collectors.toList())).collect(Collectors.toList());
CodePudding user response:
I'd rather change the order and also collect the data into a Map<String, List<String>>
where the key would be the entity name.
Assuming splittedLine
is the array of lines, I'd probably do something like this:
Set<String> L100_ENTITY_NAMES = Set.of("L100", ...);
String delimiter = String.valueOf(DELIMITER);
Map<String, List<String>> result =
Arrays.stream(splittedLine)
.map(line -> {
String[] values = line.split(delimiter );
if( values.length < 3) {
return null;
}
return new AbstractMap.SimpleEntry<>(values[2], line);
})
.filter(Objects::nonNull)
.filter(tempLine -> L100_ENTITY_NAMES.contains(tempLine.getEntityName()))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue, Collectors.toList());
Note that this isn't necessarily shorter but has a couple of other advantages:
- It's not O(n*m) but rather O(n * log(m)), so it should be faster for non-trivial stream sizes
- You get an entity name for each list rather than having to rely on the indices in both lists
- It's easier to understand because you use distinct steps:
- split and map the line
- filter null values, i.e. lines that aren't valid in the first place
- filter lines that don't have any of the L100 entity names
- collect the filtered lines by entity name so you can easily access the sub lists
CodePudding user response:
You're effectively asking for what languages like Scala provide on collections: groupBy
. In Scala you could write:
splitLines.groupBy(_(2)) // Map[String, List[String]]
Of course, you want this in Java, and in my opinion, not using streams here makes sense due to Java's lack of a fold
or groupBy
function.
HashMap<String, ArrayList<String>> map = new HashMap<>();
for (String[] line : splitLines) {
if (line.length < 2) continue;
ArrayList<String> xs = map.getOrDefault(line[2], new ArrayList<>());
xs.addAll(Arrays.asList(line));
map.put(line[2], xs);
}
As you can see, it's very easy to understand, and actually shorter than the stream based solution.
I'm leveraging two key methods on a HashMap
.
The first is
getOrDefault
; basically if the value associate with our key doesn't exist, we can provide a default. In our case, an emptyArrayList
.The second is
put
, which actually acts like aputOrReplace
because it lets us override the previous value associated with the key.
I hope that was helpful. :)
CodePudding user response:
you're asking for a shorter way to achieve the same, actually your code is good. I guess the only part that makes it look lengthy is the if/else check in the stream.
if (values.length > 0) {
return entity.equals(values[2]);
} else {
return false;
}
I would suggest introduce two tiny private methods to improve the readability, like this:
List<List<String>> list = L100_ENTITY_NAMES.stream()
.map(entity -> getLinesByEntity(splittedLine, entity)).collect(Collectors.toList());
private List<String> getLinesByEntity(String[] splittedLine, String entity) {
return Arrays.stream(splittedLine).filter(line -> isLineMatched(entity, line)).collect(Collectors.toList());
}
private boolean isLineMatched(String entity, String line) {
String[] values = line.split(DELIMITER);
return values.length > 0 && entity.equals(values[2]);
}
CodePudding user response:
I would convert the semicolon-delimited lines to objects as soon as possible, instead of keeping them around as a serialized bunch of data.
First, I would create a model modelling our data:
public record LBasedEntity(long id, int zero, String lcode, …) { }
Then, create a method to parse the line. This can be as well an external parsing library, for this looks like CSV with semicolon as delimiter.
private static LBasedEntity parse(String line) { String[] parts = line.split(";"); if (parts.length < 3) { return null; } long id = Long.parseLong(parts[0]); int zero = Integer.parseInt(parts[1]); String lcode = parts[2]; … return new LBasedEntity(id, zero, lcode, …); }
Then the mapping is trivial:
Map<String, List<LBasedEntity>> result = Arrays.stream(lines) .map(line -> parse(line)) .filter(Objects::nonNull) .filter(lBasedEntity -> L100_ENTITY_NAMES.contains(lBasedEntity.lcode())) .collect(Collectors.groupingBy(LBasedEntity::lcode));
map(line -> parse(line))
parses the line into anLBasedEntity
object (or whatever you call it);filter(Objects::nonNull)
filters out all null values produced by the parse method;- The next
filter
selects all entities of which thelcode
property is contained in theL100_ENTITY_NAMES
list (I would turn this into aSet
, to speed things up); - Then a
Map
is with key-value pairs ofL100_ENTITY_NAME
→List<LBasedEntity>
.