Home > Software engineering >  How to find specific words in sentences
How to find specific words in sentences

Time:01-23

I am using the following code to find occurrences of blacklisted words inside sentences:

// this will return rows like "volvo is a fast car like bmw"
Optional<ProcessedWords> keywords = processedWordsService.findRandomKeywordWhereTrademarkBlacklistedIsEmpty();

        if(keywords.isPresent())
        {
            // this will return data rows like "volvo", "ibm, "bmw"
            List<BlacklistedWords> blacklistedWords = blacklistedWordsService.findAll();
            List<String> list = new ArrayList<>();
            for(BlacklistedWords item:  blacklistedWords){
                list.add(item.getKeyword());
            }

            ProcessedWords processedWords = keywords.get();
            String keyword = processedWords.getKeyword();

            if(list.contains(keyword))
            {
                System.out.println("Found blacklisted word in keyword: "   keyword);
            }

        }

As you can see I am trying to find trademarks inside sentences. The current code compares whole sentence with keyword here:

if(list.contains(keyword))

How can I implement this search properly?
I need a way to search for all blacklisted keywords inside my sentences.

CodePudding user response:

Your naming is a bit misleading. nevertheless, assuming that your list looks like this:

List<String> list = List.of("volvo", "ibm", "bmw");

and your keyword like this:

String keyword = "volvo is a fast car like bmw";

you need to iterate over your list and check for each string to see if it is in your sentence (keyword). One way to do so is to use pattern class:

List<String> list = List.of("volvo", "ibm", "bmw");
String keyword = "volvo is a fast car like bmw";

list.forEach(element -> {
    Predicate<String> predicate = Pattern.compile(Pattern.quote(element)).asPredicate();
    if (predicate.test(keyword)){
        System.out.printf("Found blacklisted word [%s] in keyword: %s %n", element, keyword);
    }
});

If you want to avoid outputing too much noise you could filter the list beforehand and summarize the output:

List<String> filterdList = list.stream()
                               .filter(e -> Pattern.compile(Pattern.quote(e)).asPredicate().test(keyword))
                               .collect(Collectors.toList());
if (filterdList.size() > 0){
    System.out.printf("Found blacklisted words %s in keyword: %s %n", filterdList, keyword);
}

CodePudding user response:

If I understand this correctly, you want to check not for only one, but for presence of all possible 'trademarks' in your sentences. This can be done with a simple lambda like this:

carNames.stream().anyMatch(
        car -> sentences.stream()
                .map(String::toLowerCase).collect(Collectors.joining())
                .contains(car.toLowerCase()));

This would take all sentences, change the case to lower and then join the words in them before doing a check with a simple String contains. It will output true / false if anyMatch is found. If you want to take some action when a 'trademark' is found - replace the anyMatch with a forEach and do something like this:

carNames.forEach(car -> {
    if (sentences.stream()
            .map(String::toLowerCase).collect(Collectors.joining())
            .contains(car.toLowerCase())) {
        log.info("List contains trademark: {}", car);
    }
});
  •  Tags:  
  • java
  • Related