bash - get previous 4 digits of a hex pattern search-CodePudding

I'm trying to do a hex search for a pattern.

I have a file and I search for a pattern on the file with...

xxd -g 2 -c 32 -u file | grep "0045 5804 0001 0000"

This returns the lines that contain that pattern.

FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001

But I want it is return the 4 digits before that pattern.

In this case the 08B9

How could I do it?

CodePudding user response：

With GNU grep and a Perl-compatible regular expression:

xxd -g 2 -c 32 -u file | grep -Po '....(?= 0045 5804 0001 0000)'

Output:

08B9

CodePudding user response：

Don't use grep, use sed, e.g. using any sed:

$ xxd whataver | sed -n 's/.*\(....\) 0045 5804 0001 0000.*/\1/p'
08B9

CodePudding user response：

A not very elegant but intuitively simple approach might be to pipe your grep result into sed and use a simple regex to substitute your search term with an empty string to the end of the line. This leaves the block you want as the last space-separated 'word' of the result, which can be retrieved by piping into awk and printing the last field (steps shown on separate lines for presentation, join them):

xxd -g 2 -c 32 -u file | 
grep "0045 5804 0001 0000" | 
sed 's/0045 5804 0001 0000.*//' | 
awk '{print $NF}'

CodePudding user response：

My xxd prints an 8-digit address, a :, 16x 4-digit hex codes (separated by spaces), and finally the corresponding raw data from the file, eg:

$ xxd -g 2 -c 32 -u  file
         1         2         3         4         5         6         7         8         9         10        11        12        13
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
00000000: 4120 302E 3730 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A  A 0.702 asdlfkjasdflkajsdf;lkasj
00000020: 6466 6C6B 6173 6A64 660A 4220 302E 3836 3820 6173 646C 666B 6A61 7364 666C 6B61  dflkasjdf.B 0.868 asdlfkjasdflka
00000040: 322E 3135 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A 6466  2.152 asdlfkjasdflkajsdf;lkasjdf
00000060: 6C6B 6173 6A64 660A                                                              lkasjdf.

NOTE: the 1st two lines (a ruler) added to show column numbering

OP appears to be interested solely in the 4-digit hex codes which means we're interested in the data in columns 11-89 (inclusive).

From here we need to address 4x different scenarios:

match could occur at the very beginning of the xxd output in which case there is no preceeding 4-digit hex code
match occurs at the beginning of the line so we're interested in the 4-digit hex code at the end of the previous line
match occurs in the middle of the line in which case we're interested in the 4-digit hex code just prior to the match
match spans two lines in which case we're interested in the 4-digit hex code just prior to the match on the 1st line

A contrived set of xxd output to demonstrate all 4x scenarios:

$ cat xxd.out 
00000000: 0045 5804 0001 0000 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A  A 0.702 asdlfkjasdflkajsdf;lkasj
#         ^^^^^^^^^^^^^^^^^^^
00000020: 0045 5804 0001 0000 660A 4220 0045 5804 0001 0000 646C 666B 6A61 7364 0045 5804  dflkasjdf.B 0.868 asdlfkjasdflka
#         ^^^^^^^^^^^^^^^^^^^           ^^^^^^^^^^^^^^^^^^^                     ^^^^^^^^^
00000040: 0001 0000 3B6C 6B61 736A 6466 6C6B 6173 6A64 660A 4320 332E 3436 3720 6173 646C  jsdf;lkasjdflkasjdf.C 3.467 asdl
#         ^^^^^^^^^

NOTE: comments added to highlight our matches

One idea using awk:

x='0045 5804 0001 0000'

cat xxd.out |                                             # simulate feeding xxd output to awk
awk -v x="${x}" '

function parse_string() {

    while ( length(string) > (2 * lenx) ) {
          pos= index(string,x)

          if (pos) {
             if   (pos==1) output= "NA (at front of file)"
             else          output= substr(string,pos - 5,4)

             cnt      
             printf "Match #%s: %s\n", cnt, output
             string= substr(string,pos   lenx)
          }
          else {
             string= substr(string,length(string) - (2 * lenx))
             break
          }
    }
}

BEGIN { lenx = length(x) }

      { string=string pfx substr($0,11,79)               # append 4-digit hex codes into one long line
        pfx=" "
        if ( length(string) > (1000 * lenx) )
           parse_string()
      }

END   { parse_string() }
'

NOTE: the parse_string() function and the assorted if (length(string) > ...) tests allow us to limit memory usage to 1000x the length of our search pattern (in this example => 1000 x 19 = 19,000); granted, 'overkill' in the case of small files but it allows us to process large(r) files without having to worry about hogging memory (or in a worst case scenario: an OOM - Out Of Memory - error)

This generates:

Match #1: NA (at front of file)
Match #2: 736A
Match #3: 4220
Match #4: 7364