Home > Net >  Is there a command for substituting a set of characters by a set of strings?
Is there a command for substituting a set of characters by a set of strings?

Time:09-23

I'm would like to substitute a set of edit: single byte characters with a set of literal strings in a stream, without any constraint on the line size.

#!/bin/bash

for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i   ))
do
    printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'

The expected output would be:

<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...

I can think of a bash function that would do that, something like:

chars_to_strings() {
    local delim buffer
    while true
    do
        delim=''
        IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'

        if [[ -n "${delim: _}" ]] || [[ -n "${buffer: _}" ]]
        then
            # Do the replacements in "$buffer"
            # ...

            printf "%s%s" "$buffer" "$delim"
        else
            break
        fi
    done
}

But I'm looking for a more efficient way, any thoughts?

CodePudding user response:

Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?

sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'

Or, via separate commands:

sed -e $'s/\a/<bell>/g' \
    -e $'s/\b/<backspace>/g' \
    -e $'s/\t/<horizontal-tab>/g' \
    -e $'s/\v/<vertical-tab>/g'

Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):

$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
  {
    gsub(/\a/, "<bell>")
    gsub(/\b/, "<backspace>")
    gsub(/\t/, "<horizontal-tab>")
    gsub(/\v/, "<vertical-tab>")
    print $0
  }
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>

CodePudding user response:

For a simple one-liner with reasonable portability, try Perl.

for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i   ))
do
    printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
  s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'

Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.

Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).

CodePudding user response:

Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...

A few items:

  • continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
  • push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
  • within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
  • awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')

General idea:

chars_to_strings() {
    while read -r -n 15 buffer               # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
    do
        printf "%s\n" "${buffer}"
    done | awk '{print NR,FNR,length($0)}'   # replace 'print ...' with OP's replacement logic
}

Take for a test drive:

for (( i = 1; i <= 20; i   ))
do  
    printf '\a,\b,\t,\v'
    sleep 0.1                 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings 

1 1 15                        # output starts printing right away
2 2 15                        # instead of waiting for the 'for'
3 3 15                        # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15

A variation on this idea using a named pipe:

mkfifo /tmp/pipeX

sleep infinity > /tmp/pipeX                        # keep pipe open so awk does not exit

awk '{print NR,FNR,length($0)}' < /tmp/pipeX &

chars_to_strings() {
    while read -r -n 15 buffer
    do
        printf "%s\n" "${buffer}"
    done > /tmp/pipeX
}

Take for a test drive:

for (( i = 1; i <= 20; i   ))
do
    printf '\a,\b,\t,\v'
    sleep 0.1
done | chars_to_strings

1 1 15                        # output starts printing right away
2 2 15                        # instead of waiting for the 'for'
3 3 15                        # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15

# kill background 'awk' and/or 'sleep infinity' when no longer needed
  • Related