I'm trying to do a hex search for a pattern.
I have a file and I search for a pattern on the file with...
xxd -g 2 -c 32 -u file | grep "0045 5804 0001 0000"
This returns the lines that contain that pattern.
FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001
But I want it is return the 4 digits before that pattern.
In this case the 08B9
How could I do it?
CodePudding user response:
With GNU grep
and a Perl-compatible regular expression:
xxd -g 2 -c 32 -u file | grep -Po '....(?= 0045 5804 0001 0000)'
Output:
08B9
CodePudding user response:
Don't use grep, use sed, e.g. using any sed:
$ xxd whataver | sed -n 's/.*\(....\) 0045 5804 0001 0000.*/\1/p'
08B9
CodePudding user response:
A not very elegant but intuitively simple approach might be to pipe your grep
result into sed
and use a simple regex to substitute your search term with an empty string to the end of the line. This leaves the block you want as the last space-separated 'word' of the result, which can be retrieved by piping into awk
and printing the last field (steps shown on separate lines for presentation, join them):
xxd -g 2 -c 32 -u file |
grep "0045 5804 0001 0000" |
sed 's/0045 5804 0001 0000.*//' |
awk '{print $NF}'
CodePudding user response:
My xxd
prints an 8-digit address, a :
, 16x 4-digit hex codes (separated by spaces), and finally the corresponding raw data from the file, eg:
$ xxd -g 2 -c 32 -u file
1 2 3 4 5 6 7 8 9 10 11 12 13
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
00000000: 4120 302E 3730 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
00000020: 6466 6C6B 6173 6A64 660A 4220 302E 3836 3820 6173 646C 666B 6A61 7364 666C 6B61 dflkasjdf.B 0.868 asdlfkjasdflka
00000040: 322E 3135 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A 6466 2.152 asdlfkjasdflkajsdf;lkasjdf
00000060: 6C6B 6173 6A64 660A lkasjdf.
NOTE: the 1st two lines (a ruler) added to show column numbering
OP appears to be interested solely in the 4-digit hex codes which means we're interested in the data in columns 11-89 (inclusive).
From here we need to address 4x different scenarios:
- match could occur at the very beginning of the
xxd
output in which case there is no preceeding 4-digit hex code - match occurs at the beginning of the line so we're interested in the 4-digit hex code at the end of the previous line
- match occurs in the middle of the line in which case we're interested in the 4-digit hex code just prior to the match
- match spans two lines in which case we're interested in the 4-digit hex code just prior to the match on the 1st line
A contrived set of xxd
output to demonstrate all 4x scenarios:
$ cat xxd.out
00000000: 0045 5804 0001 0000 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
# ^^^^^^^^^^^^^^^^^^^
00000020: 0045 5804 0001 0000 660A 4220 0045 5804 0001 0000 646C 666B 6A61 7364 0045 5804 dflkasjdf.B 0.868 asdlfkjasdflka
# ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^
00000040: 0001 0000 3B6C 6B61 736A 6466 6C6B 6173 6A64 660A 4320 332E 3436 3720 6173 646C jsdf;lkasjdflkasjdf.C 3.467 asdl
# ^^^^^^^^^
NOTE: comments added to highlight our matches
One idea using awk
:
x='0045 5804 0001 0000'
cat xxd.out | # simulate feeding xxd output to awk
awk -v x="${x}" '
function parse_string() {
while ( length(string) > (2 * lenx) ) {
pos= index(string,x)
if (pos) {
if (pos==1) output= "NA (at front of file)"
else output= substr(string,pos - 5,4)
cnt
printf "Match #%s: %s\n", cnt, output
string= substr(string,pos lenx)
}
else {
string= substr(string,length(string) - (2 * lenx))
break
}
}
}
BEGIN { lenx = length(x) }
{ string=string pfx substr($0,11,79) # append 4-digit hex codes into one long line
pfx=" "
if ( length(string) > (1000 * lenx) )
parse_string()
}
END { parse_string() }
'
NOTE: the parse_string()
function and the assorted if (length(string) > ...)
tests allow us to limit memory usage to 1000x the length of our search pattern (in this example => 1000 x 19 = 19,000
); granted, 'overkill' in the case of small files but it allows us to process large(r) files without having to worry about hogging memory (or in a worst case scenario: an OOM - Out Of Memory - error)
This generates:
Match #1: NA (at front of file)
Match #2: 736A
Match #3: 4220
Match #4: 7364