grep is great at finding lines that match a pattern. But what if you have a file with a single extremely long line (say a 100MB file), and you want to find chunks within it that match a pattern?
For each match, you'd want to print the offset, and the matched string, with extra characters on either side for context.
In Python, you could write something like this (would need boundary checks):
[(m.start(), s[m.start()-50:m.end() 50]) for m in re.finditer(regex, s)]
But is there some way to do the equivalent using standard linux command line tools?
CodePudding user response:
For each match, you'd want to print the offset, and the matched string, with extra characters on either side for context.
You can do that with awk like this:
awk '{
i = 1
while (match(substr($0, i), /regex/)) {
off = i RSTART - 1
print off, substr($0, off > 50 ? off - 50 : 1, RLENGTH 100)
i = off RLENGTH
}
}' file