How do I update my GitHub Actions CI pipeline such that, if any variation of the attacks demonstrated in the Trojan Code whitepaper are submitted as a PR to my GitHub repository, the PR either automatically rejects the submission or a comment is added to the PR warning about the vulnerability.
Background: on 2021-10-30, Nicholas Boucher and Ross Anderson published a paper titled Trojan Source: Invisible Vulnerabilities -- which outlined several ways that unicode could be used maliciously in code submissions that are appear (pixel-for-pixel) identical to non-malicious code, but are--in-fact--malicious. Besides more-obvious "ambiguous characters" used to define & call distinct functions, they specifically describe how a clever attacker can utilize unicode bidirectional control characters to do some very nasty things.
More background: I manage an open-source python project that's hosted on GitHub. Setting aside that after this paper was published, GitHub added warnings when viewing code containing potentially malicious unicode characters, visually detecting these issues in a PR was impossible in the GitHub WUI when merging PRs.
My question is: how can I protect myself from yet-to-be-discovered malicious unicode commits? And other literally-impossible-to-see vulnerabilities?
What can I add to my GitHub Actions CI pipeline to warn me about invisible dangers in user-contributed python code?
EDIT: Examples that should be caught include the following python snippets:
CodePudding user response:
You can add a workflow to your GitHub Actions pipelines that detects non-ascii characters and automatically comments a WARNING to the PR.
Add this into .github/workflows/unicode_warn.yml
in the root of your repo:
################################################################################
# File: .github/workflows/unicode_warn.yml
# Version: 0.1
# Purpose: Detects Unicode in PRs and comments the results of findings in PR
# Authors: Michael Altfield <[email protected]>
# Created: 2021-11-20
# Updated: 2021-11-20
################################################################################
name: malicious_sanity_checks
# execute this workflow automatically on all PRs
on: [pull_request]
jobs:
unicode_warn:
runs-on: ubuntu-latest
container: debian:bullseye-slim
steps:
- name: Prereqs
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
apt-get update
apt-get install -y git bsdmainutils
git clone "https://token:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git" .
shell: bash
- name: Check diff for unicode
id: unicode_diff
run: |
set -x
diff=`git diff --unified=0 ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }} | grep -E "^[ ]" | grep -Ev '^(--- a/|\ \ \ b/)'`
unicode_diff=`echo -n "${diff}" | grep -oP "[^\x00-\x7F]*"`
unicode_grep_exit_code=$?
echo "${unicode_diff}"
unicode_diff_hexdump=`echo -n "${unicode_diff}" | hd`
echo "${unicode_diff_hexdump}"
# did we select any unicode characters?
if [[ "${unicode_diff_hexdump}" == "" ]]; then
# we didn't find any unicode characters
human_result="INFO: No unicode characters found in PR's commits"
echo "${human_result}"
else
# we found at least 1 unicode character
human_result="^^ WARNING: Unicode characters found in diff!"
echo "${human_result}"
echo "${diff}"
fi
echo "UNICODE_HUMAN_RESULT=${human_result}" >> $GITHUB_ENV
shell: bash {0}
# leave a comment on the PR. See also
# * https://stackoverflow.com/a/64126737
# make sure this doesn't open command injection risks
# * https://github.com/victoriadrake/github-guestbook/issues/1#issuecomment-657121754
- name: Leave comment on PR
uses: actions/github-script@v5
with:
github-token: ${{secrets.GITHUB_TOKEN}}
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: "${{ env.UNICODE_HUMAN_RESULT }}"
})
The above file defines a workflow named malicious_sanity_checks with job named unicode_warn. This job contains multiple steps that will execute every time a new PR is created in your repo:
- Prereqs - First, basic depends like git and hd are installed
- Check diff for unicode - A simple BASH script uses grep to detect non-ascii characters in a diff across the PR's commits-to-be-merged
- Leave comment on PR - Adds a comment on the PR indicating if the commits include unicode characters or not
For more information about this and the Trojan Source vulnerabilities that this can protect you against, see the source: