I have files with constant stream of letters, capped at 10 letters per line, like so:
ABCDEFGHIJ
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
XXXXABCDEF
ABCDEFGHIJ
I want to remove the Xs in pairs of three, so I want the result to be
ABCDEFGHIJ
XABCDEF
ABCDEFGHIJ
My current approach is
sed 's/XXX//g' inputFile > outputFile
but that only considers the pattern within a single line, and results in:
ABCDEFGHIJ
X
X
X
XABCDEF
ABCDEFGHIJ
How do I need to formulate the search pattern to ignore linebreaks, so to essentially accept XXX, X\nXX, and XX\nX? Is this possible with sed, or another command?
CodePudding user response:
With GNU sed. Modify your regex.
sed -zE 's/X\n{0,1}X\n{0,1}X\n{0,1}//g' inputFile > outputFile
Or shorter:
sed -zE 's/(X\n?){3}//g' inputFile > outputFile
Output to outputFile
:
ABCDEFGHIJ XABCDEF ABCDEFGHIJ
-z
: separate lines by NUL characters
CodePudding user response:
This will do it:
paste -sd '' your_file | sed 's/XXX/ /g' | fold -w 10 | sed 's/ //g; /^$/d'
paste -sd '' your_file
merges all the lines onto a single linesed 's/XXX/ /g'
replaces three X's by three spaces (note this will be problematic if the original file has spaces, since in the last step I remove them all... you could choose some other unique replacement if this is the case).fold -w 10
folds the long line back to a set of lines 10 characters longsed 's/ //g; /^$/d'
removes the spaces and the removes any blank lines (if you used some other unique replacement instead of spaces in the second step, remove that instead of spaces in this step).
Outputs
ABCDEFGHIJ
XABCDEF
ABCDEFGHIJ