Home > Software engineering >  Extracting hashtags from a sentence given by a user without using list Java
Extracting hashtags from a sentence given by a user without using list Java

Time:09-19

I'm trying to make a program where a user can post a comment and it'll be able to extract the words, e.g.

I love to #program in #java

would show the output

#program
#java

What I have currently is not running, although there is no errors detected.

class userInput {
    public static Scanner input = new Scanner(System.in);

    public static String readString(String message){
        System.out.println(message);
        String readValue = input.nextLine();
        return readValue;
    }

    public static int readInt(String message){
        System.out.println(message);
        int readValue = input.nextInt();
        input.nextLine();
        return readValue;
    }

    public static double readDouble(String message){
        System.out.println(message);
        double readValue = input.nextDouble();
        input.nextLine();
        return readValue;
    }

    public static void close(){
        input.close();
    }

    public static void main(String[] args) {
        String post [] = new String [5];
        String userPost = "";
        userPost = userInput.readString("Type your post");
        post[0] = userPost;
        String hashtags ="";
        for (int i = 0; i<post.length && post[i]!=null;i  ){
            String[]words = post[i].split(" ");
            for(int j=0;j<words.length;j  ){
                if(words[j].trim().startsWith("#")){
                    hashtags =words[j].trim()   " ";
                }
            }
        }
        if(hashtags.trim().isEmpty())
            System.out.println("No hashtags were typed");
        else 
            System.out.println("Hashtags found:"   hashtags);
    }
}

CodePudding user response:

I would use regular expressions.

In the below code, the pattern that I search for is a # character followed by one or more lowercase letters which is what I understood from the example in your question. If that is not the case, then you will need to change the pattern. Refer to the documentation and there are also many questions here about regular expressions in Java.

Also note that the below code uses the stream API. Method results was added in JDK 9, so you need at least that version in order to run the below code.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Comments {

    public static void main(String[] strings) {
        String sentence = "I love to #program in #java.";
        Pattern regex = Pattern.compile("#[a-z] ");
        Matcher mtchr = regex.matcher(sentence);
        mtchr.results()
             .forEach(mr -> System.out.println(mr.group()));
    }
}

The above code produces the following output:

#program
#java

CodePudding user response:

You can use split(" ") to split a sentence into words. You can then iterate over all the words and only find those that start with a #. Last but not least you should remove any punctuation marks at the end of words. The most concise and readable way to do this in my opinion is to use Java 8 Streams and the filter() and map() methods. Instead of returning a List using toList() you could of course also return an array using toArray().

import java.util.*;

public class Application {
    public static void main(String[] args) {
        var sentence = "I love to #program in #java.";
        System.out.printf("Hashtags in sentence: %s\n", findHashtags(sentence));
    }

    public static List<String> findHashtags(String sentence){
        var punctuationMarksAtEndOfWordRegex = "[.?,:!]$";
        return Arrays.stream(sentence.split(" "))
                .filter(word -> word.startsWith("#"))
                .map(hashtag -> hashtag.replaceAll(punctuationMarksAtEndOfWordRegex, "")).toList();
    }
}

CodePudding user response:

A really naive way is to loop over the comment and check if we encountered a hashtag then once we find a hashtag we start another loop where we add to a our result string the characters starting from the current index until the end of the comment and making sure we don't encounter a space character.

public static String extract(String comment)
    {
        String result = "";
        for(int i=0; i<comment.length(); i  )
        {
            char current = comment.charAt(i);
            if(current == '#')
            {
                for(int j=i; j<comment.length() && comment.charAt(j) != ' '; j  )
                    result  = comment.charAt(j);
            }
        }
        return result;
    }
  • Related