How to make YAML-embedded shell script safe against problems with different quotes?-CodePudding

I have some app which gets configured using a YAML file. That app does some processing and supports hooks to execute shell code before and after the processing. This is mostly meant to execute external script files doing the real business, but one can e.g. export environment variables as well. It seems like things are simply forwarded to a shell call with the configured string.

The important thing to note is that one hook is specially called in case ANYTHING goes wrong during processing. In that case the app provides some additional error details to the configured shell script. This is done by reading the necessary part of the YAML config and doing a simple string replacement of special keywords to what is actually available on runtime. Those keywords follow the syntax {...}. Things look like the following in my config:

on_error:
    - |
      export BGM_ERR_VAR_CONFIG_PATH="{configuration_filename}"
      export BGM_ERR_VAR_REPO="{repository}"
      export BGM_ERR_VAR_ERROR_MSG="{error}"
      export BGM_ERR_VAR_ERROR_OUT="{output}"

      '/path/to/script.sh 'some_arg' '[...]' [...]

Originally those keywords were expected to be forwarded as arguments in the called script, but my script needs some other arguments already, so I decided to forward things using environment variables. Shouldn't make too much of a difference regarding my problem, though.

That problem is that really ANYTHING can got wrong and especially the placeholder {output} can contain arbitrary complex error messages. It's most likely a mixture of executed shell commands, using single quotes in most cases, and stacktraces of the programming language the app is implemented in, using double quotes. With my config above this leads to invalid shell code being executed in the end:

[2021-10-12 07:18:46,073] ERROR: /bin/sh: 13: Syntax error: Unterminated quoted string

The following is what the app logs as being executed at all:

[2021-10-12 07:18:46,070] DEBUG: export BGM_ERR_VAR_CONFIG_PATH="/path/to/some.yaml"
export BGM_ERR_VAR_REPO="HOST:PARENT/CHILD"
export BGM_ERR_VAR_ERROR_MSG="Command 'borg check --prefix arch- --debug --show-rc --umask 0007 HOST:PARENT/CHILD' returned non-zero exit status 2."
export BGM_ERR_VAR_ERROR_OUT="using builtin fallback logging configuration
35 self tests completed in 0.04 seconds
SSH command line: ['ssh', '-F', '/[...]/.ssh/config', 'HOST', 'borg', 'serve', '--umask=007', '--debug']
RemoteRepository: 169 B bytes sent, 66 B bytes received, 3 messages sent
Connection closed by remote host
Traceback (most recent call last):
  File "borg/archiver.py", line 177, in wrapper"

'/path/to/script.sh '[...]' '[...]' '[...]' '[...]'

The args to my own script are safe regarding quoting, those are only hard-coded paths, keywords etc., nothing dynamic in any way. The problem should be the double quotes used for the path to the python file throwing the exception. OTOH, if I only use single quotes with my environment variables, those would break because the output shell command invoked uses single quotes as well.

So, how do I implement a safe forwarding of {output} into the environment variable in this context?

I thought of using some subshell ="$(...)" and sed to normalize quotes, but everything I came up with resulted in a command line with exactly the same quoting problems like before. Same goes for printf and its %q to escape quotes. It seems I need something which is able to deal with arbitrary individual arguments and joining those to some string again or something like that. Additionally, things should not be too complex to not bloat the YAML config in the end.

The following might work, but loses the double quotes:

export BGM_ERR_VAR_ERROR_OUT="$(echo "{output}")"

How about that?

export BGM_ERR_VAR_ERROR_OUT="$(cat << EOT
{output}
EOT
)"

Anything else? Thanks!

CodePudding user response：

To avoid all the replacement problems, I suggest not using replacements, and forwarding the values as environment variables instead. This assumes you have control over the calling code, which I assume is correct from your explanation.

Since environment variables are by convention uppercase, putting your values in lowercase names is quite safe, and then you can simply do

on_error:
    - |
      export BGM_ERR_VAR_CONFIG_PATH="$configuration_filename"
      export BGM_ERR_VAR_REPO="$repository"
      export BGM_ERR_VAR_ERROR_MSG="$error"
      export BGM_ERR_VAR_ERROR_OUT="$output"

      '/path/to/script.sh 'some_arg' '[...]' [...]

The calling code would need to modify the environment accordingly so that it will contain the expected values. This is the safest way to forward the values, since it guarantees not to interpret the values as bash syntax at all.

If this is not possible, the next best thing is probably to use a heredoc, albeit one with quotes to avoid processing anything in the content – you can use read to avoid the unnecessary cat:

on_error:
    - |
      read -r -d '' BGM_ERR_VAR_CONFIG_PATH <<'EOF'
      {configuration_filename}
      EOF
      export BGM_ERR_VAR_CONFIG_PATH

      # ... snip: other variables ...

      '/path/to/script.sh 'some_arg' '[...]' [...]

The only thing you need to be aware of here is that the content may not include a line reading EOF. The calling code needs to ensure this.