Home > Software design >  Is there a simpler way to count the number of tokens in a string with duplicated delimiters in Kotli
Is there a simpler way to count the number of tokens in a string with duplicated delimiters in Kotli

Time:08-10

I want to count the number of tokens in a string with occasional delimiter duplicates. String.split() doesn't work well in this case because it doesn't deduplicate delimiters. For example:

val msg = "The  cat won't  stay   away  from  the chickens."
val tokens = msg.split(' ')
var count = 0
for(t in tokens)
    if(t != "") count  


tokens = ["The", "", "cat", "won't", "", "stay", "", "", "away", "", "from", "", "the", "chickens."]
tokens.size = 14
count = 8

I'm looking for a simpler way to get to the count of 8 in this example. Maybe there's a regex way.

CodePudding user response:

You can use

val msg = "The  cat won't  stay   away  from  the chickens."
val regex = """\S """.toRegex()
val tokens = regex.findAll(msg).map{it.value}
println(tokens.joinToString(", "))
// => The, cat, won't, stay, away, from, the, chickens.

See the Kotlin demo.

Here, you extract all non-whitespace text chunks from the given string.

This solution should be preferred as val tokens = msg.split("""\s """.toRegex()) (see demo) will produce extra items in case of leading/trailing whitespaces.

The splitting approach can work better if you first trim() the string:

val tokens = msg.trim().split("""\s """.toRegex())

See this Kotlin demo.

  • Related