I am creating a function that will accept an input and determine if the value is a certain type of hash encoding (md5, sha1, sha256, and sha512). I have asked a few classmates and logically it makes sense, but clearly something is wrong.
#!/usr/bin/bash
function identify-hash() {
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=] ${32}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid md5sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=] ${40}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha1sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=] ${64}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha256sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=] ${128}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha512sum string"
exit
else
echo "Unable to determine the hash function used to generate the input"
fi
fi
fi
fi
}
identify-hash $1
I know that hashes have a specific number of characters for them, but I don't know exactly why it's not working. Removing the {32} out of line 4 allows it to answer as a md5sum, but than it assumes everything is md5sum.
Suggestions?
CodePudding user response:
Fixed your script. I advise you would have spotted most of the issues if you had used ShellCheck:
#!/usr/bin/env bash
identify_hash() {
# local variables
local -- encrypt_input
local -- sumname
# Regex capture the hexadecimal digits
if [[ "$1" =~ ([[:xdigit:]] ) ]]; then
encrypt_input="${BASH_REMATCH[1]}"
else
encrypt_input=''
fi
# Determine name of sum algorithm based on length of encrypt_input
case "${#encrypt_input}" in
32) sumname=md5sum ;;
40) sumname=sha1sum ;;
64) sumname=sha256sum ;;
128) sumname=sha512sum ;;
*) sumname=;;
esac
# If sum algorithm name found (sumname is not empty)
if [ -n "$sumname" ]; then
printf 'The %s is a valid %s string\n' "$encrypt_input" "$sumname"
else
printf 'Unable to determine the hash function used to generate the input\n' >&2
exit 1
fi
}
identify_hash "$1"
CodePudding user response:
Further explaining @Gordon Davissons' comment and some basics for anyone who stops by
NB This answer is extremely simplified to apply only to the current question. here's my preferred guide for more regex
Basics of regex
^
- start of a line$
- end of a line[...]
- list of possible characters- has special sauce
a-z
= all lowercase (English) letters;0-9
= all digits; etc.- also accepts character classes - e.g
[:xdigit:]
for hexadecimal characters- the expression is now
[[:xdigit:]]
- i.e[:class:]
inside[...]
- the expression is now
{...}
- number of times the preceding expression should be matched^[a]{1}$
will matcha
but notaa
^f[o]{2}d$
will matchfood
but notfod, foood, fooo*d
^[a-z]{4}$
will matchball
✔️ but notbuffalo
❌cove
✔️ but notcover
❌- basically any line ( because of the
^...$
) containing a string of exactly 4 (English) alphabetic characters
{1,5}
- at least1
and at most5
*
- shorthand for{0,}
meaning 0 or any number of times{1,}
meaning at least 1; but no upper limit?
- shorthand for{1}
So ${32}
is looking for 32 "end of line" \n
in jargon and what you need is [a-z0-9=]{32}
instead
BUT as also pointed out by Andrej Podzimek in the comments you need to match only hexadecimal [0-9a-f]
characters which is the same as [:xdigit:]
. Either can be used.
PS
more Basics
.
(fullstop/period) matches ANY character including spaces and special characters(...)
is to match patterns
[a-z ]*(chicken).*
will match anything from chicken coop
to chicken soup
and please pass that chicken cookbook, Alex?
[.]
means period/fullstop not any character- note the space after
z
this is to make space (ascii 32 - and
.
is case-insensituve
PPS if it's for homework/assignment/schoolwork, please specify so in your question :)