Home > Software engineering >  Return Values Below String Pattern Match in Number Range
Return Values Below String Pattern Match in Number Range

Time:11-19

I need to match and return the values below the number range 12-00 in the first line/row (UTC) of the following text file:

UTC  06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 00 01 02 03 04 05 06 
TMP  54 53 52 50 49 48 47 47 47 48 48 48 48 48 47 45 44 43 43 41 40 39 38 37 36

So that is, matching 12 13 14 15 16 17 18 19 20 21 22 23 00 in Line 1 and returning 47 47 47 48 48 48 48 48 47 45 44 43 43 from Line 2.

My Attempt:

cat some.text.file | head -n 3 | grep -A 1 '12.*.00' | tail -n 1

Result:

TMP  54 53 52 50 49 48 47 47 47 48 48 48 48 48 47 45 44 43 43 41 40 39 38 37 36

Expected Result:

12 13 14 15 16 17 18 19 20 21 22 23 00
47 47 47 48 48 48 48 48 47 45 44 43 43

CodePudding user response:

This can be done in a single awk:

awk 'NR == 1 {for (i=1; i<=NF;   i) if ($i == "12") start = i; else if ($i == "00") stop = i} {for (i=start; i<=stop;   i) printf "%s", $i (i < stop ? OFS : ORS)}' file

12 13 14 15 16 17 18 19 20 21 22 23 00
47 47 47 48 48 48 48 48 47 45 44 43 43

A more readable version:

awk 'NR == 1 {
   for (i=1; i<=NF;   i)
      if ($i == "12")
         start = i
      else if ($i == "00")
         stop = i
}
{
   for (i=start; i<=stop;   i)
      printf "%s", $i (i < stop ? OFS : ORS)
}' file

CodePudding user response:

I would use GNU AWK for this task as follows, let file.txt content be

UTC  06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 00 01 02 03 04 05 06 
TMP  54 53 52 50 49 48 47 47 47 48 48 48 48 48 47 45 44 43 43 41 40 39 38 37 36

then

awk '/^UTC/{match($0,"12 13 14 15 16 17 18 19 20 21 22 23 00")}{print substr($0,RSTART,RLENGTH)}' file.txt

output

12 13 14 15 16 17 18 19 20 21 22 23 00
47 47 47 48 48 48 48 48 47 45 44 43 43

Explanation: I use 2 functions for working with strings namely match which does set RSTART and RLENGTH, then substr to get that part of line or part below it. 1st action is limited to lines which starts with UTC, 2nd is applied to all lines.

Disclaimer: This solution assumes that string which have to be matched is known beforehand

(tested in gawk 4.2.1)

CodePudding user response:

Would you please try the following:

awk '
NR==1 {                                 # process the 1st line
    for (i = 1; i <= NF; i  ) {         # loop over the fields
        if ($i == "12") start = i       # if the hour is "12", remember the position
        else if ($i == "00") {          # if the hour is "00", then
            for (j = start; j <= i; j  ) {
                mark[j] = 1             # set the mark since the position of "12"
            }
        }
    }
}
{
    fs = ""                             # field separator to print out
    for (i = 1; i <= NF; i  ) {
        if (mark[i]) {                  # if the flag is set for the field
            printf "%s%s", fs, $i       # then print the field
            fs = " "                    # assign fs for the following printf
        }
    }
    print ""                            # line break
}' file

Input:

UTC  12 13 14 15 16 17 18 19 20 21 22 23 00 01 02 03 04 05 06 07 08 09 10 11 12
TMP  43 43 42 43 44 45 45 46 46 45 43 42 40 39 39 37 36 35 34 34 33 33 32 32 33

Output:

12 13 14 15 16 17 18 19 20 21 22 23 00
43 43 42 43 44 45 45 46 46 45 43 42 40
  • Related