Home > OS >  can you use shell commands alter blocks of lines based on values in those lines?
can you use shell commands alter blocks of lines based on values in those lines?

Time:12-06

I have a file that has multiple blocks of lines like so

line1
line1
-----
machine:chrome
purpose:
language:v2
request:v3
additional: v4
os:v4
-----
machine:firefox
purpose:
language:v2
request:v6
os:v4
-----
machine:helper
purpose:
language:v2
request:v8
os:v4
-----
another line

The lines don't necessarily have the same elements but they all start with machine and end with os. I can only use shell commands so what I want to do is parse the line starting with machine in each block starting with machine and ending in os and use the parsed result is a command whose result is to be inserted in request.

so parse each line that has machine in it and use that value to run a different shell command with its result and then populate request with that result. As a challenge I was wondering if this could be done using only sed and awk.

My expected output for the above would be:

line1
line1
-----
machine:chrome
purpose:
language:v2
request:[output of result of ps -ef chrome | awk '{print $2}']
additional: v4
os:v4
-----
machine:firefox
purpose:
language:v2
request:[output of result of ps -ef firefox | awk '{print $2}']
os:v4
-----
machine:helper
purpose:
language:v2
request:[output of result of ps -ef helper | awk '{print $2}']
os:v4
-----
another line

Update: Trying to do this in sed alone I got the following:

gsed -r '/machine/,/os/ {/machine/ {s/.*?:\s*([^\s] )/ps -ef | grep \1\n/e}; /request/ {s/.*?:\s*([^\s] )//}}' filename

Which does not work but It runs the ps -ef | grep [machinename] and stores it in the buffer. Now I'd like to know if I can use the buffer value in the request substitution and if so how?

A

CodePudding user response:

Edit: Because of changed requirements I am updating the script. The folowing produces the required output:

#!/bin/bash
function processlines() {
    local line machine request
    
    # Skips first three lines, but it is not really necessary.
    #for k in  `seq 1 3`; do read line; echo "$line"; done
    
    while true; do
        read -r line || return 0
        
        if echo "$line" | grep '^machine:' >> /dev/null; then
            machine="$(echo "$line" | cut -d ':' -f 2)"
            echo "$line"
        elif echo "$line" | grep '^request:' >> /dev/null; then
            request="$(echo YOUR COMMAND HERE "$machine")"
            echo "request:$request"
        else
            echo "$line"
        fi
    done
}

processlines < test.txt

Note: This works as long as the fields appear in the order shown by you. If "request" appears before "machine" or if one of both is missing in the file, the script would break. Please let me know if this can be the case.

Old answer: You don't need sed or awk for that. It's doable almost by pure bash tail/cut:

cat test.txt | tail -n  4 | while read machineline; do
    [[ "$machineline" == "another line" ]] && break

    read purposeline
    read languageline
    read requestline
    read osline
    read separatorline
    
    machine="$(echo $machineline | cut -d ':' -f 2)"
    purpose="$(echo $purposeline | cut -d ':' -f 2)"
    language="$(echo $languageline | cut -d ':' -f 2)"
    request="$(echo $requestline | cut -d ':' -f 2)"
    os="$(echo $osline | cut -d ':' -f 2)"
    separator="$(echo $separatorline | cut -d ':' -f 2)"
    
    # Here do anything with the variables...
    echo "machine is '$machine'" \
         "purpose is '$purpose'" \
         "language is '$language'" \
         "request is '$request'" \
         "os is '$os'" \
         "separator is '$separator'"
done

And if you need the "machine" value only, then it is way easier:

cat test.txt | grep '^machine:' | cut -d ':' -f 2 | while read machinevalue; do
    # call your other command here...
    echo "machine value is '$machinevalue'"
done

A word of caution: If your values contain the character ":" this script would break and then you would have to use sed 's/^machine://g' instead of cut -d ':' -f 2.

A possible optimization would be to use bash for extracting the parts of the string but I am too lazy for that and unless I need the performance, I prefer using shell commands because I remember them more easily.

CodePudding user response:

Regarding I was wondering if this could be done using only sed and awk - no, it can't because the task requires a shell to call ps so any sed or awk script would need to spawn a subshell to call ps, they can't call it on their own. So if you tried to do that then in terms of calls you'd end up with something like shell { awk { system { subshell { ps } } } } (which clearly isn't only using awk anyway) instead of simply shell { ps }.

Using md5sum (a very common application for this technique) for the example instead of ps -ef which would produce different output on everyone's different machines, you can tweak it to use ps -ef later, you COULD do the following (but don't, see the 2nd script below for a better approach):

$ cat tst.sh
#!/usr/bin/env bash

infile="$1"

while IFS= read -r line; do
    if [[ "$line" =~ ^([^:] ):(.*) ]]; then
        tag="${BASH_REMATCH[1]}"
        val="${BASH_REMATCH[2]}"
        case "$tag" in
            machine )
                mac="$val"
                ;;
            request )
                val="$(printf '%s' "$mac" | md5sum | cut -d' ' -f1)"
                ;;
        esac
        line="${tag}:${val}"
    fi
    printf '%s\n' "$line"
done < "$infile"

$ ./tst.sh file
line1
line1
-----
machine:chrome
purpose:
language:v2
request:554838a8451ac36cb977e719e9d6623c
additional: v4
os:v4
-----
machine:firefox
purpose:
language:v2
request:d6a5c9544eca9b5ce2266d1c34a93222
os:v4
-----
machine:helper
purpose:
language:v2
request:fde5d67bfb6dc4b598291cc2ce35ee4a
os:v4
-----
another line

While the above would work it'd be very inefficient (see why-is-using-a-shell-loop-to-process-text-considered-bad-practice) since it's looping through every line of input using shell so the following is how I'd really approach a task like this as it's far more efficient since it only has shell loop through each of the machine: lines from the input (which is unavoidable and is a far smaller number of iterations than if it had to read every input line) and the rest is done with a single call to sed to generate the input for the shell loop and a single call to awk to produce the output:

$ cat tst.sh
#!/usr/bin/env bash

infile="$1"

sed -n 's/^machine://p' "$infile" |
while IFS= read -r mac; do
    printf '%s\n%s\n' "$mac" "$(printf '%s' "$mac" | md5sum | cut -d' ' -f1)"
done |
awk '
    NR==FNR {
        if ( NR % 2 ) {
            mac = $0
        }
        else {
            map[mac] = $0
        }
        next
    }
    {
        tag = val = $0
        sub(/:.*/,"",tag)
        sub(/[^:]*:/,"",val)
    }
    tag == "machine" { mac = val }
    tag == "request" { $0 = tag ":" map[mac] }
    { print }
' - "$infile"

$ ./tst.sh file
line1
line1
-----
machine:chrome
purpose:
language:v2
request:554838a8451ac36cb977e719e9d6623c
additional: v4
os:v4
-----
machine:firefox
purpose:
language:v2
request:d6a5c9544eca9b5ce2266d1c34a93222
os:v4
-----
machine:helper
purpose:
language:v2
request:fde5d67bfb6dc4b598291cc2ce35ee4a
os:v4
-----
another line

Here's what each of the above steps does:

  1. Get just the machine: lines and remove the machine: part so we can have shell loop through just the parts it needs to call some command (e.g. ps -ef or md5sum) on:
$ sed -n 's/^machine://p' "$infile"
chrome
firefox
helper
  1. Loop through each of those lines producing a mapping from that word to the output of the shell command you need to run on it (we generate the mapping in pairs of lines so the subsequent awk can parse it robustly even if the machine name from the input contained :s):
$ sed -n 's/^machine://p' "$infile" |
while IFS= read -r mac; do
    printf '%s\n%s\n' "$mac" "$(printf '%s' "$mac" | md5sum | cut -d' ' -f1)"
done
chrome
554838a8451ac36cb977e719e9d6623c
firefox
d6a5c9544eca9b5ce2266d1c34a93222
helper
fde5d67bfb6dc4b598291cc2ce35ee4a
  1. Pass that mapping to awk which separates it into the part before the : (which I'm calling a tag) and the part after it (which I'm calling a value) and stores the mapping in an array:
NR==FNR {
    if ( NR % 2 ) {
        mac = $0
    }
    else {
        map[mac] = $0
    }
    next
}

  1. It now reads the input file again, then using that array to modify the request: lines before printing each line (populating tag and val this way instead of setting FS to : and using $1 and $2 so we can again handle any input that contains :s in other locations):
{
    tag = val = $0
    sub(/:.*/,"",tag)
    sub(/[^:]*:/,"",val)
}
tag == "machine" { mac = val }
tag == "request" { $0 = tag ":" map[mac] }
{ print }

The above assumes the shell command output is a single line each time it's called.

CodePudding user response:

In pure bash:

#!/bin/bash

while IFS= read -r line; do
    if [[ $line = machine:* ]]; then mach=${line#*:}
    elif [[ $line = os:* ]]; then mach=""; fi

    if [[ $line = request:* && $mach ]]; then
        printf 'request:'
        your_command "$mach"
    else
        printf '%s\n' "$line"
    fi
done < file

If the output of your command doesn't end with a newline character, then place an echo after your command.

  • Related