Suppose that I have a list:
List<String> dest = Arrays.asList(
"abc abd 2000",
"idf owe 1200",
"jks ldg 789",
"ccc hhh 2000",
"www uuu 1000"
);
And I'm trying to get the number at the end of every string. The given list has only integers in it, but I'm writing the regex for doubles too:(\\d \\.?\\d )
. In Java 1.8, I wrote the following code:
ArrayList<String> mylist = new ArrayList<>(
dest.stream()
.filter(Pattern.compile("\\D \\s\\D \\s(\\d \\.?\\d )").asPredicate())
.collect(Collectors.toList())
);
What I'm trying to do is - get the (\\d \\.?\\d )
group from each found string, how can I do it?
I was thinking about applying a Matcher
to each element of the list, but I'm not sure about how to implement it.
CodePudding user response:
I'm trying to get the number at the end of every string...
Solution 1
Maybe you can solve it without using regex, like so:
List<String> response = dest.stream()
.map(String::trim)
.map(s -> s.split("\\s "))
.map(r -> r[r.length - 1])
.toList();
Solution 2
If you insist on using regex, you can use:
final String regex = "\\D \\s\\D \\s(\\d \\.?\\d )";
final Pattern compile = Pattern.compile(regex);
List<String> response = dest.stream()
.map(compile::matcher)
.filter(Matcher::find)
.map(r -> r.group(1))
.toList();
Outputs
[2000.55, 1200, 789, 2000, 1000]
CodePudding user response:
filter
keeps or removes elements from the list. If you want to transform stream elements (which you do when you extract the number), use map
.
Then you can use the regex (along with a Matcher
) to extract data:
Pattern p = Pattern.compile("\\D \\s\\D \\s(\\d \\.?\\d )");
List<String> mylist = dest.stream()
.map(s -> {
Matcher matcher = p.matcher(s);
matcher.find();
return matcher.group(1); //error handling sold seperately
})
.collect(Collectors.toList());
System.out.println(mylist);
prints
[2000, 1200, 789, 2000, 1000]
CodePudding user response:
You should .map()
instead of .filter()
.
ArrayList<String> mylist = new ArrayList<>(
dest.stream()
.map(s -> s.replaceAll("\\D \\s\\D \\s(\\d \\.?\\d )", "$1"))
.collect(Collectors.toList()));
System.out.println(mylist);
output:
[2000, 1200, 789, 2000, 1000]
CodePudding user response:
Firstly, there are several issues that worth to emphasize:
In you've code started with a
filter()
operation and that was the right step, becausemap()
can't discard an element from the stream, it performs a one-to-one transformation. If you need to make sure that an element is valid, you need to applyfilter()
first. Another option would be to use Java 16 operationmapMulty()
which can transforms a stream element into**0 **
(zero or more) elements and can act like bothfilter()
andmap()
and more importantly would improve the performance because it allows to avoid processing valid string with the regex-engine twice.Another important thing to consider is that there's more than one valid representation of
double
: exponential, hexadecimal, etc. All options are listed in the documentation ofDouble.valueOf()
. Even if we are talking only about a plain decimal floating-point number1.
and.9
which don't the regular expression you're using are also validdouble
s. Note that Javadoc ofvalueOf()
contains a ready-to-go regex for validating doubles, which cavers of all the possible (it's really massive and sprinkled with comments, so I'm not posting it here).In the regex, you're not testing whether the number is located at the end of the string. In the description you've said: "I'm trying to get the number at the end of every string". If it's important, you need to prepend
$
denoting that captured string should be located at the very end.Lastly, it's redundant to wrap a
List
returned byCollectors.toList()
with anArrayList
.
For simplicity, instead of mentioned regex for checking all the forms of double I would use the following regular expression "(\\d \\.?\\d*|\\.\\d )$"
.
The first part \\d \\.?\\d*
would match to a whole number, e.g. 999
, a floating-point number with no fraction part, e.g. 1.
, and a regular floating-point, e.g. 99.999
.
The second part \\.\\d
matches a floating-point number without the integer part, e.g. .995
.
Here how it can be done using Java 16 Stream API features and java.util.regex.MatchResult
(although you mentioned Java 8 I guess this solution can be useful for other readers):
List<String> strings = List.of(
"abc abd 2000", "idf owe 1200", "jks ldg 789",
"ccc hhh 2000", "www uuu 1000", "ccc hhh 2000.",
"abc abd 2000.1", "idf owe 1200.0", "jks ldg 789.995", ".999", // floated-point numbers
"abc abd 2000y", "idf owe", "jks ldg 789.995%wtqop", "....ljsofo." // invalid strings
);
Pattern p = Pattern.compile("(\\d \\.?\\d*|\\.\\d )$");
List<String> numbers = strings.stream()
.<String>mapMulti((str, consumer) ->
p.matcher(str).results().map(MatchResult::group).forEach(consumer)
)
.toList();
System.out.println(numbers);
Output:
[2000, 1200, 789, 2000, 1000, 2000., 2000.1, 1200.0, 789.995, .999]