Home > OS >  How to find the Longest repeated Substring using Java Sream?
How to find the Longest repeated Substring using Java Sream?

Time:06-26

For a school assignment, I have to implement a method that will return the longest repeated substring from a string. But I have to do it by using only the Stream API.

This is what I have done so far:

public static String biggestRedundantSubstring(String s) {
    Stream.Builder<String> stringBuilder = Stream.builder();
    while (!Objects.equals(s, "")) {
        stringBuilder.add(s);
        s = s.substring(1);
    }
    return stringBuilder.build().sorted().reduce("",
            (String biggestRedundantSubstring, String matchingPrefix) ->
                    biggestRedundantSubstring.length() > matchingPrefix.length() ?
                            biggestRedundantSubstring : matchingPrefix,
            (String sub1, String sub2) -> {
                String matchingPrefix = "";
                int limitIndex = Math.max(sub1.length(), sub2.length()) - 1;
                for (int i = 0; i < limitIndex; i  ) {
                    if (sub1.charAt(i) == sub2.charAt(i)) {
                        matchingPrefix  = sub1.charAt(i);
                    } else {
                        break;
                    }
                }       
                return matchingPrefix;
            });
}

And this is the main() method:

public static void main(String[] args) {
    if (args.length != 0) {
        for (String s : args) {
            System.out.println(s   " "   biggestRedundantSubstring(s));
        }
    } else {
        String s = "ababaac";
        System.out.println(s   " "   biggestRedundantSubstring(s));
    }
}

This without arguments should print on console:

ababaac aba

But instead it prints:

ababaac ababaac

So I was wondering if it is possible? I also tried to use reduce() with three parameters, but it doesn't work either, and I could use a variable outside of the stream to keep track of and change the value of the "biggestRedundantSubstring" variable, but it kind of goes against why to use streams, doesn't it?

CodePudding user response:

Firstly, let's describe the overall algorithm:

  • We need to generate all possible substrings. I.e. for every valid index of the string from 0 up to str.length() - 1 we have to construct all substrings having the length from 1 to str.length() - index.

  • Then we need to check which of these substrings occur more than once and pick the longest of them.

To generate all substring, we need to walk through all the valid indices using nested IntStream (the same can be done using a nested for loop). And when we have all the possible starting and ending indices, we can turn them into substrings.

Then to find out which of these substrings appear more than once, we can use a map Map<String,Boolean>, in which keys would be the strings itself and a value mapped to each key would signify whether it's repeated or not. We can do this by using collector toMap(keyMapper,valueMapper,mergeFunction).

Then when we need to create a stream over the entries of the auxiliary map. Filter out repeated entries and pick the one that has the highest length by using max() operation that returns an Optional object and expects a Comparator as an argument.

That's how it might look like:

public static String biggestRedundantSubstring(String str) {
    
    return IntStream.range(0, str.length()) // generating starting indices
        .boxed()
        .flatMap(start -> IntStream.rangeClosed(start   1, str.length()).mapToObj(end -> str.substring(start,  end))) // generating ending indices and turning each combination of `start` and `end` into a substring
        .collect(Collectors.toMap(   // creating an intermediate map Map<String, Boolean>
            Function.identity(),
            s -> false,              // is repeated substring = false, because element has been encountered for the first time
            (v1, v2) -> true         // substring has been encountered more than once, i.e. is proved to be repeated
        ))
        .entrySet().stream()
        .filter(Map.Entry::getValue) // filtering the repeated substrings
        .max(Map.Entry.comparingByKey(Comparator.comparingInt(String::length))) // returns Optional<Map.Entry<String, Boolean>>
        .map(Map.Entry::getKey)      // returns Optional<String>
        .orElse("");                 // if Optional is empty return an empty string
}

main()

public static void main(String[] args) {
    String s = "ababaac";
    System.out.println(s   " "   biggestRedundantSubstring(s));
}

Output:

ababaac aba
  • Related