Bash Run-length Encoding script-CodePudding

Original Q. Given a string containing uppercase characters (A-Z), compress the string using Run Length encoding. Repetition of character has to be replaced by storing the length of that run. Write a unix function encode(message) which performs the run length encoding for a given String and returns the run length encoded String. Provide different String values and test your program. Example: message=AAAABBBBCCCCCCCC output: 4A4B8C

#!/usr/bin/bash

encode()
{
        msg=$1

        for (( i=0 ; i<${#msg} ; i   ))
        do
                j=$(($i 1))
                if [[ $j < ${#msg} ]] && [[ ${msg:$i:1} == ${msg:$j:1} ]]
                then
                        echo "${msg:$i:1} == ${msg:$j:1}"
                else
                        echo "${msg:$i:1} != ${msg:$j:1}"
                fi
        done
}

#read -p "Enter String to Encrypt : " str
str='AAAABBBBCCCCCCCC'

if [ ${#str} -eq 0 ] || ! [[ $str =~ [a-zA-Z] $ ]]
then
        echo -e "\n===> Invalid String <===\n"
        exit
fi

echo -e "Input  => $str"
encode $str

Getting OUTPUT :

[practiceScript]$ bash 20.sh
Input  => AAAABBBBCCCCCCCC
A == A
A != A
A != A
A != B
B != B
B != B
B != B
B != C
C != C
C == C
C == C
C == C
C == C
C == C
C == C
C !=
[practiceScript]$

Want to understand why my script is generating Not Equal Output for same character i

CodePudding user response：

I suggest:

# Remove single newline from $1 at the end, append a newline after 
# every character, count characters and remove all newlines and spaces
encode() {
  echo "$1" | tr -d '\n' | sed "s/./&\n/g" | uniq -c | tr -d '\n '
}

message="AAAABBBBCCCCCCCC"
encode "$message"

Output:

4A4B8C

CodePudding user response：

A method in pure bash to achieve this task:

#!/bin/bash

encode () {
    local msg=$1 prefix pat

    while [[ $msg ]]; do
        pat=[^${msg:0:1}]*
        prefix=${msg%%$pat}
        printf '%d%s' ${#prefix} "${prefix:0:1}"    
        msg=${msg#"$prefix"}
    done
    echo
}

encode AAAABBBBCCCCCCCC