I'm trying to do a hex search for a pattern.
I have a file and I search for a pattern on the file with...
xxd -g 2 -c 32 -u file | grep "0045 5804 0001 0000"
This returns the lines that contain that pattern:
FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001
But I want it to return the 4 digits before that pattern which is 08B9
in this case. How could I do it?
CodePudding user response:
With GNU grep
and a Perl-compatible regular expression:
xxd -g 2 -c 32 -u file | grep -Po '....(?= 0045 5804 0001 0000)'
Output:
08B9
CodePudding user response:
Don't use grep, use sed, e.g. using any sed:
$ xxd whataver | sed -n 's/.*\(....\) 0045 5804 0001 0000.*/\1/p'
08B9
CodePudding user response:
A not very elegant but intuitively simple approach might be to pipe your grep
result into sed
and use a simple regex to substitute your search term with an empty string to the end of the line. This leaves the block you want as the last space-separated 'word' of the result, which can be retrieved by piping into awk
and printing the last field (steps shown on separate lines for presentation, join them):
xxd -g 2 -c 32 -u file |
grep "0045 5804 0001 0000" |
sed 's/0045 5804 0001 0000.*//' |
awk '{print $NF}'
CodePudding user response:
nawk 'sub(".* ",_, $!--NF)^_' OFS= FS=' 0045 5804 0001 0000.*$'
mawk '$!NF = $--NF' FS=' 0045 5804 0001 0000.*$| ' gawk ' $_ = $--NF' FS=' 0045 5804 0001 0000.*$| '
08B9
CodePudding user response:
Just make a lookahead and print only the matched string
xxd -g 2 -c 32 -u file | grep -Po "[0-9A-F]{4} (?=0045 5804 0001 0000)"
xxd -g 2 -c 32 -u file | perl -lne 'print for /([0-9A-F]{4}) (?=0045 5804 0001 0000)/'
08B9
But searching the hex representation like that is just silly because:
- It won't work when the pattern
0045 5804 0001 0000
is at the beginning of the line (i.e. the output is on the previous line) - It'll be much slower than to search directly in binary
So just search directly with grep
then decode like this
grep -Pao "..\x00\x45\x58\x04\x00\x01\x00\x00" file | xxd -p -u -l 2
grep -ao $'..\x00\x45\x58\x04\x00\x01\x00\x00' file | xxd -p -u -l 2
also works but not in every case due to the handling of null bytes
If the pattern contains LF \n
then you'll also need the -z
option
grep -Pzao "..<hex pattern>" file | xxd -p -u -l 2
grep -zao $'..<hex pattern>' file | xxd -p -u -l 2
CodePudding user response:
I would harness GNU AWK
for this task following way, let file.txt
content be
FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001
then
awk 'match($0, /[[:xdigit:]]{4} 0045 5804 0001 0000/){print substr($0,RSTART,4)}' file.txt
gives output
08B9
Explanation: I use two String Functions, match
to check if current line ($0
) and set RSTART
variable, then substr
to get 4 first characters of match. [[:xdigit:]]
denotes base-16 digit, {4}
number of repeats.
(tested in gawk 4.2.1)
CodePudding user response:
My xxd
prints an 8-digit address, a :
, 16x 4-digit hex codes (separated by spaces), and finally the corresponding raw data from the file, eg:
$ xxd -g 2 -c 32 -u file
1 2 3 4 5 6 7 8 9 10 11 12 13
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
00000000: 4120 302E 3730 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
00000020: 6466 6C6B 6173 6A64 660A 4220 302E 3836 3820 6173 646C 666B 6A61 7364 666C 6B61 dflkasjdf.B 0.868 asdlfkjasdflka
00000040: 322E 3135 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A 6466 2.152 asdlfkjasdflkajsdf;lkasjdf
00000060: 6C6B 6173 6A64 660A lkasjdf.
NOTE: the 1st two lines (a ruler) added to show column numbering
OP appears to be interested solely in the 4-digit hex codes which means we're interested in the data in columns 11-89 (inclusive).
From here we need to address 4x different scenarios:
- match could occur at the very beginning of the
xxd
output in which case there is no preceeding 4-digit hex code - match occurs at the beginning of the line so we're interested in the 4-digit hex code at the end of the previous line
- match occurs in the middle of the line in which case we're interested in the 4-digit hex code just prior to the match
- match spans two lines in which case we're interested in the 4-digit hex code just prior to the match on the 1st line
A contrived set of xxd
output to demonstrate all 4x scenarios:
$ cat xxd.out
00000000: 0045 5804 0001 0000 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
# ^^^^^^^^^^^^^^^^^^^
00000020: 0045 5804 0001 0000 660A 4220 0045 5804 0001 0000 646C 666B 6A61 7364 0045 5804 dflkasjdf.B 0.868 asdlfkjasdflka
# ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^
00000040: 0001 0000 3B6C 6B61 736A 6466 6C6B 6173 6A64 660A 4320 332E 3436 3720 6173 646C jsdf;lkasjdflkasjdf.C 3.467 asdl
# ^^^^^^^^^
NOTE: comments added to highlight our matches
One idea using awk
:
x='0045 5804 0001 0000'
cat xxd.out | # simulate feeding xxd output to awk
awk -v x="${x}" '
function parse_string() {
while ( length(string) > (2 * lenx) ) {
pos= index(string,x)
if (pos) {
if (pos==1) output= "NA (at front of file)"
else output= substr(string,pos - 5,4)
cnt
printf "Match #%s: %s\n", cnt, output
string= substr(string,pos lenx)
}
else {
string= substr(string,length(string) - (2 * lenx))
break
}
}
}
BEGIN { lenx = length(x) }
{ string=string substr($0,11,80) # strip off address & raw data, append 4-digit hex codes into one long string
if ( length(string) > (1000 * lenx) )
parse_string()
}
END { parse_string() }
'
NOTE: the parse_string()
function and the assorted if (length(string) > ...)
tests allow us to limit memory usage to 1000x the length of our search pattern (in this example => 1000 x 19 = 19,000
); granted, 'overkill' in the case of small files but it allows us to process large(r) files without having to worry about hogging memory (or in a worst case scenario: an OOM - Out Of Memory - error)
This generates:
Match #1: NA (at front of file)
Match #2: 736A
Match #3: 4220
Match #4: 7364