I have the sgring below for which I'm trying to get the width of each field (counting the spaces) and store it in a variable
in format pos = "len1 len2 len3... lenN"
string="AB BC_DEF GH I "
Then fields lengths would be
- 1st field:
AB
with length of 10 - 2nd field:
BC_DEF
with length of 11 - 3rd field
GH
with length of 5 - 4th field
I
with length of 8
My current attempt is like below
string="AB BC_DEF GH I "
words=("AB" "BC_DEF" "GH" "I")
for w in "${words[@]}"; do
pos="$pos $(echo "$string" | grep -bo "$w" | sed 's/:.*$//')"
done
$ echo $pos
0 10 21 26
But is getting me the beginning of each "word". Is there a way to get easely what I try to do? Thanks
CodePudding user response:
$ <<<"$string" awk 'BEGIN{FS=ORS;ORS=OFS}{gsub(/ /,"&\n")}{for(i=1;i<NF; i) print length($i)}'
10 11 5 8
$ <<<"$string" sed 's/ \ /&\n/g' | awk 'BEGIN{ORS=OFS}{print length}'
10 11 5 8 0
$ <<<"$string" awk 'BEGIN{RS=" ";ORS=OFS}RT{print length length(RT)}'
10 11 5 8
CodePudding user response:
Eat the string one token at a time. You could for example use bash, or anything else, express your tokenization as a regex or just a loop. For example:
string="AB BC_DEF GH I "
while ((${#string})); do
[[ "$string" =~ ^([^ ]* *)(.*) ]]
echo "elem: '${BASH_REMATCH[1]}'"
string=${BASH_REMATCH[2]}
done
Will output:
elem: 'AB '
elem: 'BC_DEF '
elem: 'GH '
elem: 'I '
I.e. "eat" from the input tokens from the beginning to back.
Well, you could just:
string="AB BC_DEF GH I "
readarray -d '' -t arr < <(printf "%s" "$string" | sed 's/[^ ]* */&\x00/g')
declare -p arr
But pure bash will be faster.