Home > Mobile >  TCL regex match only full words in a list
TCL regex match only full words in a list

Time:12-24

i am working with the regexp command in TCL to get the amount of times, some string occurs in a list. So i have created following example:

set text "Gi Gi Gi Tw Tw Tw Tw Tw Tw Tw Tw Tw Tw Twe Twe Gi Gi Te Te Te Te"
set things2searchfor "Gi Te Tw Twe"

foreach entry $things2searchfor {
    set allregexmatches [regexp -all -inline $entry $text]
    set numbrofmatches [llength $allregexmatches]
    puts "There are $numbrofmatches matches found for $entry and they all are: $allregexmatches"
}

When i run the script it gets me following output:

There are 5 matches found for Gi and they all are: Gi Gi Gi Gi Gi
There are 4 matches found for Te and they all are: Te Te Te Te
There are 12 matches found for Tw and they all are: Tw Tw Tw Tw Tw Tw Tw Tw Tw Tw Tw Tw
There are 2 matches found for Twe and they all are: Twe Twe

So the problem i have is, that I only have 10 Tw entries in the original list, but the regex matches the two Twe as well and causes the match count to get to 12.

So most of the regex solutions point me to use the dollar sign to mark an end of a line. This causes to only match the last Te because it is on the end of the line. Other solutions are to do not match XYZ, but as i am working with variables, i cant categorically not match for xyz as the input is different on every foreach and entirely different on every network device. I tried to work with the word boundry /b but this does not work either.

Any other solutions to match only full words and not parts of it? I cannot use the lsearch command as i am using TCL 8.3 here... (Thanks Cisco)

CodePudding user response:

Ok i just found the solution after reading the TCL Regex help page here: https://www.tcl.tk/man/tcl8.4/TclCmd/re_syntax.html#M30 Here it says:

\M - matches only at the end of a word

I changed the Regex in TCL (escaping with an additional backspace) to this:

[regexp -all -inline "$entry\\M" $text]

And now it works as expected:

There are 5 matches found for Gi (searchpattern: Gi\M) and they all are: Gi Gi Gi Gi Gi
There are 4 matches found for Te (searchpattern: Te\M) and they all are: Te Te Te Te
There are 10 matches found for Tw (searchpattern: Tw\M) and they all are: Tw Tw Tw Tw Tw Tw Tw Tw Tw Tw
There are 2 matches found for Twe (searchpattern: Twe\M) and they all are: Twe Twe

Sorry for posting before absolutely reading all the manuals. Thanks to anybody who may already have invested any resources to look for an anser.

CodePudding user response:

A solution without regexp

set text "Gi Gi Gi Tw Tw Tw Tw Tw Tw Tw Tw Tw Tw Twe Twe Gi Gi Te Te Te Te"    
foreach entry $text {
    switch -exact -- $entry {
        Gi  {incr Gi}
        Te  {incr Te}
        Tw  {incr Tw}
        Twe {incr Twe}
    }
}

No tested in 8.3

CodePudding user response:

Similar strategy to @Mkn but count everything into a dict.

foreach item $text {
    dict incr counts $item
}
foreach item $things2searchfor {
    puts "$item\t[dict get $counts $item]"
}
Gi  5
Te  4
Tw  10
Twe 2
  • Related