Get previous 4 digits of a hex pattern search-CodePudding

I'm trying to do a hex search for a pattern.

I have a file and I search for a pattern on the file with...

xxd -g 2 -c 32 -u file | grep "0045 5804 0001 0000"

This returns the lines that contain that pattern:

FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001

But I want it to return the 4 digits before that pattern which is 08B9 in this case. How could I do it?

CodePudding user response：

With GNU grep and a Perl-compatible regular expression:

xxd -g 2 -c 32 -u file | grep -Po '....(?= 0045 5804 0001 0000)'

Output:

08B9

CodePudding user response：

Don't use grep, use sed, e.g. using any sed:

$ xxd whataver | sed -n 's/.*\(....\) 0045 5804 0001 0000.*/\1/p'
08B9

CodePudding user response：

A not very elegant but intuitively simple approach might be to pipe your grep result into sed and use a simple regex to substitute your search term with an empty string to the end of the line. This leaves the block you want as the last space-separated 'word' of the result, which can be retrieved by piping into awk and printing the last field (steps shown on separate lines for presentation, join them):

xxd -g 2 -c 32 -u file | 
grep "0045 5804 0001 0000" | 
sed 's/0045 5804 0001 0000.*//' | 
awk '{print $NF}'

CodePudding user response：

nawk 'sub(".* ",_, $!--NF)^_' OFS= FS=' 0045 5804 0001 0000.*$'

mawk '$!NF = $--NF' FS=' 0045 5804 0001 0000.*$| '
gawk '  $_ = $--NF' FS=' 0045 5804 0001 0000.*$| '

08B9

CodePudding user response：

Just make a lookahead and print only the matched string

xxd -g 2 -c 32 -u file | grep -Po "[0-9A-F]{4} (?=0045 5804 0001 0000)"
xxd -g 2 -c 32 -u file | perl -lne 'print for /([0-9A-F]{4}) (?=0045 5804 0001 0000)/'
08B9

But searching the hex representation like that is just silly because:

It won't work when the pattern 0045 5804 0001 0000 is at the beginning of the line (i.e. the output is on the previous line)
It'll be much slower than to search directly in binary

So just search directly with grep then decode like this

grep -Pao "..\x00\x45\x58\x04\x00\x01\x00\x00" file | xxd -p -u -l 2

grep -ao $'..\x00\x45\x58\x04\x00\x01\x00\x00' file | xxd -p -u -l 2 also works but not in every case due to the handling of null bytes

If the pattern contains LF \n then you'll also need the -z option

grep -Pzao "..<hex pattern>" file | xxd -p -u -l 2
grep -zao $'..<hex pattern>' file | xxd -p -u -l 2

CodePudding user response：

I would harness GNU AWK for this task following way, let file.txt content be

FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001

then

awk 'match($0, /[[:xdigit:]]{4} 0045 5804 0001 0000/){print substr($0,RSTART,4)}' file.txt

gives output

08B9

Explanation: I use two String Functions, match to check if current line ($0) and set RSTART variable, then substr to get 4 first characters of match. [[:xdigit:]] denotes base-16 digit, {4} number of repeats.

(tested in gawk 4.2.1)

CodePudding user response：

My xxd prints an 8-digit address, a :, 16x 4-digit hex codes (separated by spaces), and finally the corresponding raw data from the file, eg:

$ xxd -g 2 -c 32 -u  file
         1         2         3         4         5         6         7         8         9         10        11        12        13
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
00000000: 4120 302E 3730 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A  A 0.702 asdlfkjasdflkajsdf;lkasj
00000020: 6466 6C6B 6173 6A64 660A 4220 302E 3836 3820 6173 646C 666B 6A61 7364 666C 6B61  dflkasjdf.B 0.868 asdlfkjasdflka
00000040: 322E 3135 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A 6466  2.152 asdlfkjasdflkajsdf;lkasjdf
00000060: 6C6B 6173 6A64 660A                                                              lkasjdf.

NOTE: the 1st two lines (a ruler) added to show column numbering

OP appears to be interested solely in the 4-digit hex codes which means we're interested in the data in columns 11-89 (inclusive).

From here we need to address 4x different scenarios:

match could occur at the very beginning of the xxd output in which case there is no preceeding 4-digit hex code
match occurs at the beginning of the line so we're interested in the 4-digit hex code at the end of the previous line
match occurs in the middle of the line in which case we're interested in the 4-digit hex code just prior to the match
match spans two lines in which case we're interested in the 4-digit hex code just prior to the match on the 1st line

A contrived set of xxd output to demonstrate all 4x scenarios:

$ cat xxd.out 
00000000: 0045 5804 0001 0000 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A  A 0.702 asdlfkjasdflkajsdf;lkasj
#         ^^^^^^^^^^^^^^^^^^^
00000020: 0045 5804 0001 0000 660A 4220 0045 5804 0001 0000 646C 666B 6A61 7364 0045 5804  dflkasjdf.B 0.868 asdlfkjasdflka
#         ^^^^^^^^^^^^^^^^^^^           ^^^^^^^^^^^^^^^^^^^                     ^^^^^^^^^
00000040: 0001 0000 3B6C 6B61 736A 6466 6C6B 6173 6A64 660A 4320 332E 3436 3720 6173 646C  jsdf;lkasjdflkasjdf.C 3.467 asdl
#         ^^^^^^^^^

NOTE: comments added to highlight our matches

One idea using awk:

x='0045 5804 0001 0000'

cat xxd.out |                                             # simulate feeding xxd output to awk
awk -v x="${x}" '

function parse_string() {

    while ( length(string) > (2 * lenx) ) {
          pos= index(string,x)

          if (pos) {
             if   (pos==1) output= "NA (at front of file)"
             else          output= substr(string,pos - 5,4)

             cnt      
             printf "Match #%s: %s\n", cnt, output
             string= substr(string,pos   lenx)
          }
          else {
             string= substr(string,length(string) - (2 * lenx))
             break
          }
    }
}

BEGIN { lenx = length(x) }

      { string=string substr($0,11,80)                   # strip off address & raw data, append 4-digit hex codes into one long string
        if ( length(string) > (1000 * lenx) )
           parse_string()
      }

END   { parse_string() }
'

NOTE: the parse_string() function and the assorted if (length(string) > ...) tests allow us to limit memory usage to 1000x the length of our search pattern (in this example => 1000 x 19 = 19,000); granted, 'overkill' in the case of small files but it allows us to process large(r) files without having to worry about hogging memory (or in a worst case scenario: an OOM - Out Of Memory - error)

This generates:

Match #1: NA (at front of file)
Match #2: 736A
Match #3: 4220
Match #4: 7364