Bash: find if all characters in one string occur within another string-CodePudding

I am new to bash. I have a question about determining if all characters of one string occur within another string. For example, if the variables are...

var_1="abcdefg"
var_2="bcg"

Then I want to write an "if" statement of the form...

if [all characters of var_2 occur within var_1]
then
     echo "All characters of var_2 occur in var_1."
else
     echo "Not all characters of var_2 occur in var_1."
fi

In this example, the output should be "All characters of var_2 occur in var_1." What would go in the "if" statement here?

This is what I tried...

if [[ $var_1 == *$var_2* ]]

but I think this is only determines if var_2 is a substring of var_1. What I want is to determine if the characters of var_2 occur within var_1 in no particular order.

CodePudding user response：

[[ $var_1 == *$var_2* ]] is a little different from "var_2 is a substring of var_1". For example, if var_2=* then the condition would always evaluate to true.

For your current problem you can iterate over each character of var_2 and check if var_1 contains it (without forgetting to quote the expansion):

#!/bin/bash

var_1="abcdefg"
var_2="bcg"

for (( ok = 1, i = 0; ok && i < ${#var_2}; i   ))
do
    [[ $var_1 == *"${var_2:i:1}"* ]] || ok=0
done

if (( ok ))
then
    echo "All characters of var_2 occur in var_1."
else
    echo "Not all characters of var_2 occur in var_1."
fi

CodePudding user response：

The following oneliner should work:

echo -e "$var_2\0$var_1" | sed -E ':a;s/(.)(.*\x0)(.*)\1(.*)/\2\3\4/;ta;s/^\x0.*/1/;s/.*\x0.*/0/'

It will print 0 or 1 to mean false or true respectively.

This is how it works:

echo -e allows using escape sequences, and \0 represents the null character, which I'm using to mark the separation between the two strings bcg and abcdefg.
The Sed script is not that complex:
- -E is a non POSIX option allowing to use ( and ) instead of $ and $ to write capturing groups (and other similar simplifications which I'm not using here);
- ;s separate commands;
- :a is a label, and allows one jumping here via ta or ba (I use only the former, keep reading);
- s/(.)(.*\x0)(.*)\1(.*)/\2\3\4/ does the following (which succeedes if there's at least one character in common between var_2 and var_1):
  - matches and captures the first character of var_2 with (.),
  - matches and captures the following part of var_2 together with the null character, (.*\x0),
  - matches and captures 0 or more characters,
  - matches what was captured by first group, i.e. by (.),
  - matches and captures 0 or more characters up to the end of var_1,
  - substitutes all that was matched with what was captured by the 2nd, 3rd, and 4th capturing groups: in fact, we've got rid of one character in common between var_2 and var_1;
- ta test if the previous substitution was successful and, if that's the case, it jumps to :a: this way we are running a loop as long as there's a characters in common between var_2 and var_1;
- when ther's no characters in common between var_2 and var_1, the test will fail, and the control will fall through ta;
- s/^\x0.*/1/ matches whatever is left, but only if the null character \x0 is leading, which happens if all letters of var_2 were found in var_1, and changes everything to just 1;
- s/.*\x0.*/0/ will match everything, as long as there's still \x0 in the string, which happens only if the previous substitution failed, which means that some letter from var_2 was not found in var_1, and change it to 0.