Home > Software engineering >  Expect is skipping some of the output buffer in favor the first re pattern
Expect is skipping some of the output buffer in favor the first re pattern

Time:10-29

Here's my expect script:

#!/usr/bin/expect -f

spawn -noecho zsh -li
expect {
  -re {\e]\d ;[^\a]*\a} { send_error "escape found: $expect_out(buffer)\n"; exp_continue }
  "current_dir" { send_error "prompt found: $expect_out(buffer)\n" }
  timeout { exit 1 }
}

I'm basically trying to filter out escape sequences that could contain the "current_dir" string, and then match the first time that string appears outside the escape sequences.

The stderr output of running this is:

escape found: \e]1337;[email protected]\a
escape found: \e]1337;CurrentDir=/path/to/current_dir\a
escape found: \e]1337;ShellIntegrationVersion=11;shell=zsh\a
escape found: \e[1m\e[7m%\e[27m\e[1m\e[0m \r \r\e]0;current_dir\a
escape found: \e]133;D;0\a
escape found: \e]1337;[email protected]\a
escape found: \e]1337;CurrentDir=/path/to/current_dir\a
escape found: \r\e[0m\e[27m\e[24m\e[J\e]133;A\a
escape found: \r\n\e[1;36m/path/to/current_dir\e[0m on \e[1;35mmaster\e[0m \r\n\e[1;32m%\e[0m \e]133;B\a

All the "escape found" matches are valid, unless for the last one. In the 2nd, 4th and 7th matches, we even see that the "current_dir" string is contained inside the sequence, which are the ones that I want to be ignored.

In the last "escape found" output, we see that there was an escape sequence at the end: "\e]133;B\a". But there was a "current_dir" before that which was not matched by the plain "current_dir" match pattern.

I was expecting to see:

prompt found: \r\n\e[1;36m/path/to/current_dir

instead of the last "escape found" output. But it's timing out instead, as that was passed unnoticed by expect... (notice that the escape sequences I'm trying to ignore start with \e] which is different than \e[.

What am I missing here?

EDIT: Notice I'm printing the whole expect_out(buffer) above. If I changed the rules to only print expect_out(0,string), which is what each rule actually matched, I'd get:

escape found: \e]1337;[email protected]\a
escape found: \e]1337;CurrentDir=/path/to/current_dir\a
escape found: \e]1337;ShellIntegrationVersion=11;shell=zsh\a
escape found: \e]0;current_dir\a
escape found: \e]133;D;0\a
escape found: \e]1337;[email protected]\a
escape found: \e]1337;CurrentDir=/path/to/current_dir\a
escape found: \e]133;A\a
escape found: \e]133;B\a

so the output I'd like the second rule to match was not matched by the first rule.

CodePudding user response:

You're missing that the rules are applied in order, and that subsequent matches happen after the point that the last matched rule matched at. With your problem input buffer, both rules can match, so the first one is matched which consumes everything up to the end, and masks the second one from ever matching.

You need to write your matches somewhat differently so that the rule that picks up what you want is matched before the rule that skips over it. However, that's difficult here because there's a lot of other stuff going on; you probably need a more complex prompt match pattern, and you need to put it first.

This might work:

expect {
    -re {\e\[1;36m([^\e]*)\e\[0m on \e\[1;35m([^\e]*)\e\[0m} {
        send_error "prompt found: $expect_out(buffer)\n"
        send_error "prompt dir: $expect_out(1,string)\n"
        send_error "prompt branch: $expect_out(2,string)\n"
    }
    -re {\e]\d ;[^\a]*\a} {
        send_error "escape found: $expect_out(buffer)\n"
        exp_continue
    }
    timeout {
        exit 1
    }
}

You might want to consider setting the TERM environment variable to dumb (disabling almost all of those escapes) before spawning so that you don't have quite so much complexity to deal with. It's also often a good idea to explicitly set the prompt of controlled programs (just for the session) to something that it's easy to match. It's not to say that that's definitely going to work for you here, but those sorts of techniques often help by making the matching problem a lot easier to handle.

You don't really need iTerm2 escape codes for this particular application, do you?

CodePudding user response:

As a reference for anyone with a similar problem -- mainly if you want to be able to ignore iTerm2 escape codes --, any of these two scripts worked for me:

#!/usr/bin/expect -f

spawn -noecho zsh -li
expect {
  -re {\e]133;B\a} { send_error "prompt found: $expect_out(buffer)\n" }
  -re {\e]\d ;[^\a]*\a} { exp_continue }
  "current_dir" { send_error "prompt found: $expect_out(buffer)\n" }
  timeout { exit 1 }
}

or

#!/usr/bin/expect -f

spawn -noecho zsh -li
expect {
  -re {\e]([07]|1337);[^\a]*\a} { exp_continue }
  "current_dir" { send_error "prompt found: $expect_out(buffer)\n" }
  timeout { exit 1 }
}

Thanks to @Donald Fellows and @sexpect for making it clear that expect reaches the output buffer in chunks, so we need to be careful with having the rules in the right order for patterns that could occur too close to each other in the output.

  • Related