I am using the following code to find occurrences of blacklisted words inside sentences:
// this will return rows like "volvo is a fast car like bmw"
Optional<ProcessedWords> keywords = processedWordsService.findRandomKeywordWhereTrademarkBlacklistedIsEmpty();
if(keywords.isPresent())
{
// this will return data rows like "volvo", "ibm, "bmw"
List<BlacklistedWords> blacklistedWords = blacklistedWordsService.findAll();
List<String> list = new ArrayList<>();
for(BlacklistedWords item: blacklistedWords){
list.add(item.getKeyword());
}
ProcessedWords processedWords = keywords.get();
String keyword = processedWords.getKeyword();
if(list.contains(keyword))
{
System.out.println("Found blacklisted word in keyword: " keyword);
}
}
As you can see I am trying to find trademarks inside sentences. The current code compares whole sentence with keyword here:
if(list.contains(keyword))
How can I implement this search properly?
I need a way to search for all blacklisted keywords inside my sentences.
CodePudding user response:
Your naming is a bit misleading. nevertheless, assuming that your list looks like this:
List<String> list = List.of("volvo", "ibm", "bmw");
and your keyword like this:
String keyword = "volvo is a fast car like bmw";
you need to iterate over your list and check for each string to see if it is in your sentence (keyword). One way to do so is to use pattern class:
List<String> list = List.of("volvo", "ibm", "bmw");
String keyword = "volvo is a fast car like bmw";
list.forEach(element -> {
Predicate<String> predicate = Pattern.compile(Pattern.quote(element)).asPredicate();
if (predicate.test(keyword)){
System.out.printf("Found blacklisted word [%s] in keyword: %s %n", element, keyword);
}
});
If you want to avoid outputing too much noise you could filter the list beforehand and summarize the output:
List<String> filterdList = list.stream()
.filter(e -> Pattern.compile(Pattern.quote(e)).asPredicate().test(keyword))
.collect(Collectors.toList());
if (filterdList.size() > 0){
System.out.printf("Found blacklisted words %s in keyword: %s %n", filterdList, keyword);
}
CodePudding user response:
If I understand this correctly, you want to check not for only one, but for presence of all possible 'trademarks' in your sentences. This can be done with a simple lambda like this:
carNames.stream().anyMatch(
car -> sentences.stream()
.map(String::toLowerCase).collect(Collectors.joining())
.contains(car.toLowerCase()));
This would take all sentences, change the case to lower and then join the words in them before doing a check with a simple String
contains
. It will output true
/ false
if anyMatch
is found. If you want to take some action when a 'trademark' is found - replace the anyMatch
with a forEach
and do something like this:
carNames.forEach(car -> {
if (sentences.stream()
.map(String::toLowerCase).collect(Collectors.joining())
.contains(car.toLowerCase())) {
log.info("List contains trademark: {}", car);
}
});