Home > Blockchain >  Split string after every n words in java and store it in an array
Split string after every n words in java and store it in an array

Time:04-21

I have a string for example,

String s = "This is a String which needs to be split after every n words";

Suppose I have to divide this string after every 5 words of which the output should be,

Arraylist stringArr = ["This is a String which", "needs to be split after", "every n words"]

How can do this and store it in an array in java

CodePudding user response:

While there isn't a built-in way for Java to do this, it's fairly easy to do using Java's standard regular-expressions.

My example below tries to be clear, rather than trying to be the "best" way.
It's based on finding groups of five "words" followed by a space, based on the regular expression ([a-zA-Z] ){5}) which says
[a-zA-Z] find any letters, repeated ( )
followed by a space
(...) gather into groups
{5} exactly 5 times

You may want things besides letters, and you may want to allow multiple spaces or any whitespace, not just spaces, so later in the example I change the regex to (\\S \\s ){5} where \S means any non-whitespace and \s means any whitespace.

This first goes through the process in the main method, displaying output along the way that, I hope, makes it clear what's going on; then shows how the process could be made into a method.
I create a method that will split a line into groups of n words, then call it to split your string every 5 words then again but every 3 words.

Here it is:

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LineSplitterExample
{
    public static void main(String[] args)
    {
        String s = "This is a String which needs to be split after every n words";

      //Pattern p = Pattern.compile("([a-zA-Z]   ){5}");
        Pattern p = Pattern.compile("(\\S   ){5}");
        Matcher m = p.matcher(s);
        int last = 0;

        List<String> collected = new ArrayList<>();

        while (m.find()) {
            System.out.println("Group Count = "   m.groupCount());
            for (int i=0; i<m.groupCount(); i  ) {
                final String found = m.group(i);
                System.out.printf("Group %d: %s%n", i, found);
                collected.add(found);
                // keep track of where the last group ended
                last = m.end();
                System.out.println("'m.end()' is "   last);
            }
        }

        // collect the final part of the string after the last group
        String tail = s.substring(last);
        System.out.println(tail);
        collected.add(tail);

        String[] result = collected.toArray(new String[0]);
        System.out.println("result:");
        for (int n=0; n<result.length; n  ) {
            System.out.printf("-: %s%n", n, result[n]);
        }

        // Put a little space after the output
        System.out.println("\n");


        // Now use the methods...

        String[] byFive = splitByWords(s, 5);
        displayArray(byFive);

        String[] byThree = splitByWords(s, 3);
        displayArray(byThree);
    }

    private static String[] splitByWords(final String s, final int n)
    {
      //final Pattern p = Pattern.compile("([a-zA-Z]   ){" n "}");
        final Pattern p = Pattern.compile("(\\S \\s ){" n "}");
        final Matcher m = p.matcher(s);

        List<String> collected = new ArrayList<>();
        int last = 0;

        while (m.find()) {
            for (int i=0; i<m.groupCount(); i  ) {
                collected.add(m.group(i));
                last = m.end();
            }
        }

        collected.add(s.substring(last));

        return collected.toArray(new String[0]);
    }

    private static void displayArray(final String[] array)
    {
        System.out.println("Array:");
        for (int i=0; i<array.length; i  ) {
            System.out.printf("-: %s%n", i, array[i]);
        }
    }
}

The output I got by running this is:

Group Count = 1
Group 0: This is a String which 
'm.end()' is 23
Group Count = 1
Group 0: needs to be split after 
'm.end()' is 47
every n words
result:
 0: This is a String which 
 1: needs to be split after 
 2: every n words


Array:
 0: This is a String which 
 1: needs to be split after 
 2: every n words
Array:
 0: This is a 
 1: String which needs 
 2: to be split 
 3: after every n 
 4: words

CodePudding user response:

You can do it with a combination of replaceAll and split

  • S{N} - matches N iterations of S
  • () - regular expression capture group
  • $1 - back reference to the captured group

Replace every occurrence of N words with that occurrence followed by a special delimiter (in this case ###). Then split on that delimiter.

public static String[] splitNWords(String s, int count) {
    String delim = "((?:\\w \\s ){" count "})";
    return s.replaceAll(delim, "$1###").split("###");
}

Demo

String s = "This is a String which needs to be split after every n words";

for (int i = 1; i < 5; i  ) {
    String[] arr = splitNWords(s, i);
    System.out.println("Splitting on "   i   " words.");
    for (String st : arr) {
        System.out.println(st);
    }
    System.out.println();
}

prints

Splitting on 1 words.
This 
is 
a 
String 
which 
needs 
to 
be 
split 
after 
every 
n 
words

Splitting on 2 words.
This is 
a String 
which needs 
to be 
split after 
every n 
words

Splitting on 3 words.
This is a 
String which needs 
to be split 
after every n 
words

Splitting on 4 words.
This is a String 
which needs to be 
split after every n 
words

CodePudding user response:

I dont think there is a split every n words. You need to specify a pattern, like blank space. You can for instance, Split every blank and later iterate over the array created and make another one with tue number of words you want.

Regards

  •  Tags:  
  • java
  • Related