Get width of fields from string-CodePudding

I have the sgring below for which I'm trying to get the width of each field (counting the spaces) and store it in a variable in format pos = "len1 len2 len3... lenN"

string="AB        BC_DEF     GH   I       "

Then fields lengths would be

1st field: AB with length of 10
2nd field: BC_DEF with length of 11
3rd field GH with length of 5
4th field I with length of 8

My current attempt is like below

string="AB        BC_DEF     GH   I       " 
words=("AB" "BC_DEF" "GH" "I")

for w in "${words[@]}"; do
    pos="$pos $(echo "$string" | grep -bo "$w" | sed 's/:.*$//')"
done

$ echo $pos
0 10 21 26

But is getting me the beginning of each "word". Is there a way to get easely what I try to do? Thanks

CodePudding user response：

$ <<<"$string" awk 'BEGIN{FS=ORS;ORS=OFS}{gsub(/  /,"&\n")}{for(i=1;i<NF;  i) print length($i)}'                                              
10 11 5 8 
$ <<<"$string" sed 's/ \ /&\n/g' | awk 'BEGIN{ORS=OFS}{print length}'
10 11 5 8 0
$ <<<"$string" awk 'BEGIN{RS="  ";ORS=OFS}RT{print length length(RT)}'
10 11 5 8

CodePudding user response：

Eat the string one token at a time. You could for example use bash, or anything else, express your tokenization as a regex or just a loop. For example:

string="AB        BC_DEF     GH   I       " 
while ((${#string})); do
    [[ "$string" =~ ^([^ ]* *)(.*) ]]
    echo "elem: '${BASH_REMATCH[1]}'"
    string=${BASH_REMATCH[2]}
done

Will output:

elem: 'AB        '
elem: 'BC_DEF     '
elem: 'GH   '
elem: 'I       '

I.e. "eat" from the input tokens from the beginning to back.

Well, you could just:

string="AB        BC_DEF     GH   I       "
readarray -d '' -t arr < <(printf "%s" "$string" | sed 's/[^ ]* */&\x00/g')
declare -p arr

But pure bash will be faster.