I want to count the number of tokens in a string with occasional delimiter duplicates. String.split()
doesn't work well in this case because it doesn't deduplicate delimiters. For example:
val msg = "The cat won't stay away from the chickens."
val tokens = msg.split(' ')
var count = 0
for(t in tokens)
if(t != "") count
tokens = ["The", "", "cat", "won't", "", "stay", "", "", "away", "", "from", "", "the", "chickens."]
tokens.size = 14
count = 8
I'm looking for a simpler way to get to the count of 8 in this example. Maybe there's a regex way.
CodePudding user response:
You can use
val msg = "The cat won't stay away from the chickens."
val regex = """\S """.toRegex()
val tokens = regex.findAll(msg).map{it.value}
println(tokens.joinToString(", "))
// => The, cat, won't, stay, away, from, the, chickens.
See the Kotlin demo.
Here, you extract all non-whitespace text chunks from the given string.
This solution should be preferred as val tokens = msg.split("""\s """.toRegex())
(see demo) will produce extra items in case of leading/trailing whitespaces.
The splitting approach can work better if you first trim()
the string:
val tokens = msg.trim().split("""\s """.toRegex())
See this Kotlin demo.