Home > other >  Bash script: Comparing strings in inline array vs. array imported with mapfile
Bash script: Comparing strings in inline array vs. array imported with mapfile

Time:11-02

I have two arrays, one with a list of domains and another with a list of blacklisted domains, which I want to compare. The idea is that the script is going to do X if the domain isn't blacklisted.

The script works fine if I make an array like this:

    DOMAINS=(
    'domain1.no 443'
    'domain2.no 443'
    'domain3.no 443'
    'domain4.no 443'
    'domain5.no 443'
    )
    BLACKLIST=(
    'domain1.no'
    'domain3.no'
    'domain5.no'
    )
Output:
domain1.no is blacklisted
domain2.no is NOT blacklisted
domain3.no is blacklisted
domain4.no is NOT blacklisted
domain5.no is blacklisted

But if I create the arrays by importing domains from domains.txt/blacklist.txt with mapfile -t then the script doesn't work. Like this:

mapfile -t DOMAINS < domains.txt
mapfile -t BLACKLIST < blacklist.txt
domains.txt contents:
domain1.no 443
domain2.no 443
domain3.no 443
domain4.no 443
domain5.no 443

blacklist.txt contents:
domain1.no
domain3.no
domain5.no

Output:
domain1.no is NOT blacklisted
domain2.no is NOT blacklisted
domain3.no is NOT blacklisted
domain4.no is NOT blacklisted
domain5.no is blacklisted

This is the rest of the script:

function test_function ()
{
    host=$1
    is_blacklisted=0
    
    for domain in "${BLACKLIST[@]}"; do
        if [[ " $host " == *" $domain "* ]]; then
            is_blacklisted=1
        fi
    done
    
    if [ $is_blacklisted == 1 ]; then
        printf "%s\n" "$host is blacklisted"
        
    elif [ $is_blacklisted == 0 ]; then
        printf "%s\n" "$host is NOT blacklisted"
    fi
}

for domain in "${DOMAINS[@]}"; do
    test_function $domain
done

My question is, what is the reason that the comparison doesn't work properly when using the mapfile array?

I'm very, very new to bash scripting (and to this site), my code might not bee too good and obvious answers will probably not be so obvious to me!

'443' is added to the DOMAINS array for another script that checks SSL, which is why it's there but not used in this script. I wanted to use these .txt files so that I don't have to update each scripts array manually but instead I can update the .txt file.

If it matters, I'm using Ubuntu/WSL from Microsoft app store.

CodePudding user response:

As discussed in the comments, the most likely issue was a \r character left at the end of each line. Here’s a possible solution that removes such \r characters. Also, it preprocesses the blacklist into an associative array for more efficient lookups.

#!/bin/bash
set -euo pipefail

if (($# != 2)); then
  echo "Usage: ${0} <blacklist file> <domains file>"
  exit 1
fi

is_blacklisted() {
  (($# == 2))  # crash on wrong number of arguments
  local -nr linenum_map="$1"  # array passed by name reference
  local -ir linenum='linenum_map["$2"]'  # integer evaluation
  if ((linenum)); then
    printf '%s is blacklisted on line %d\n' "$2" "$((linenum))"
    return 1  # easier for callers than output parsing
  else
    printf '%s is NOT blacklisted\n' "$2"
  fi
}

readarray -t input < "$1"
declare -Ai blacklist
for i in "${!input[@]}"; do
  ((blacklist["${input[i]%$'\r'}"] = i   1))  # domain without \r -> line
done

readarray -t input < "$2"
domains=("${input[@]% *}")  # remove everything after space (maybe also \r)
for domain in "${domains[@]}"; do
  is_blacklisted 'blacklist' "$domain" || :  # don't crash on error
done

It seems to work reasonably with the input examples provided:

$ /tmp/blacklist.sh /tmp/blacklist.txt /tmp/domains.txt
domain1.no is blacklisted on line 1
domain2.no is NOT blacklisted
domain3.no is blacklisted on line 2
domain4.no is NOT blacklisted
domain5.no is blacklisted on line 3
  • Related