Home > Back-end >  Java function to parse all doubles from string
Java function to parse all doubles from string

Time:05-31

I know this has been asked before¹ but responses don't seem to cover all corner cases.

I tried implementing the suggestion¹ with the test case

String("Doubles -1.0, 0, 1, 1.12345 and 2.50")

Which should return

[-1, 0, 1, 1.12345, 2.50]:

import java.util.Scanner;
import java.util.ArrayList;
import java.util.Locale;
public class Main
{
    public static void main(String[] args) {
        String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
        System.out.println(string);
        ArrayList<Double> doubles = getDoublesFromString(string);
        System.out.println(doubles);
    }
    
    public static ArrayList<Double> getDoublesFromString(String string){
        Scanner parser = new Scanner(string);
        parser.useLocale(Locale.US);
        ArrayList<Double> doubles = new ArrayList<Double>();
        double currentDouble;
        while (parser.hasNext()){
            if(parser.hasNextDouble()){
                currentDouble = parser.nextDouble();
                doubles.add(currentDouble);
            }
            else {
                parser.next();
            }
        }
        parser.close();
        return doubles;
    }
}

Instead code above returns [1.12345, 2.5].

Did I implement it wrong? What's the fix for catching negative and 0's?

CodePudding user response:

I would use a regex find all approach here:

String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
List<String> nums = new ArrayList<>();

String pattern = "-?\\d (?:\\.\\d )?";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(string);

while (m.find()) {
    nums.add(m.group());
}

System.out.println(nums);  // [-1.0, 0, 1, 1.12345, 2.50]

By the way, your question makes use of the String constructor, which is seldom used, but is interesting to see, especially for those of us who never use it.

Here is an explanation of the regex pattern:

-?            match an optional leading negative sign
\\d           match a whole number
(?:\\.\\d )?  match an optional decimal component

CodePudding user response:

For your specific example, adding this at the construction of the scanner is sufficient: parser.useDelimiter("\\s|,");

The problem in your code is that the tokens containing a comma are not recognized as valid doubles. What the code above does is configuring the scanner to consider not only blank characters but also commas as token delimiters, and therefore the comma will not be in the token anymore, hence it will be a valid double that will successfully be parsed.

I believe this is the most appropriate solution because matching all doubles is actually complex. Below, I have pasted the regex that Scanner uses to do that, see how complicated this really is. Compared to splitting the string and then using Double.parseDouble, this is pretty similar but involves less custom code, and more importantly no exception throwing, which is slow.

(([- ]?((((([0-9\p{javaDigit}])) )|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))) ))|(((([0-9\p{javaDigit}])) )|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))) ))\x{2e}(([0-9\p{javaDigit}])) |\x{2e}(([0-9\p{javaDigit}])) )([eE][ -]?(([0-9\p{javaDigit}])) )?)|(((((([0-9\p{javaDigit}])) )|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))) ))|(((([0-9\p{javaDigit}])) )|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))) ))\x{2e}(([0-9\p{javaDigit}])) |\x{2e}(([0-9\p{javaDigit}])) )([eE][ -]?(([0-9\p{javaDigit}])) )?)|(\Q-\E((((([0-9\p{javaDigit}])) )|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))) ))|(((([0-9\p{javaDigit}])) )|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))) ))\x{2e}(([0-9\p{javaDigit}])) |\x{2e}(([0-9\p{javaDigit}])) )([eE][ -]?(([0-9\p{javaDigit}])) )?))|[- ]?0[xX][0-9a-fA-F].[0-9a-fA-F] ([pP][- ]?[0-9] )?|(([- ]?(NaN|\QNaN\E|Infinity|\Q∞\E))|((NaN|\QNaN\E|Infinity|\Q∞\E))|(\Q-\E(NaN|\QNaN\E|Infinity|\Q∞\E)))

CodePudding user response:

First of all: I would use the regex solution, too… It's better and the following is just an alternative using split and replace/replaceAll while catching Exceptions:

public static void main(String[] args) {
    // input
    String s = "Doubles -1.0, 0, 1, 1.12345 and 2.50";
    // split by whitespace(s) (keep in mind the commas will stay)
    String[] parts = s.split("\\s ");
    // create a collection to store the Doubles
    List<Double> nums = new ArrayList<>();
    // stream the result of the split operation and
    Arrays.stream(parts).forEach(p -> {
        // try to…
        try {
            // replace all commas and parse the value
            nums.add(Double.parseDouble(p.replaceAll(",", "")));
        } catch (Exception e) {
            // which won't work for words like "Doubles", so print an error on those
            System.err.println("Could not parse \""   p   "\"");
        }
    });
    // finally print all successfully parsed Double values
    nums.forEach(System.out::println);
}

Output:

Could not parse "Doubles"
Could not parse "and"
-1.0
0.0
1.0
1.12345
2.5
  •  Tags:  
  • java
  • Related