I am trying to remove the first N characters from a text file and whats Important is that it is done NOT LINE BY LINE.
Currently this code that I have written deletes 'i' number of chars from EACH Line. But I want to delete from the whole text.
for FILE in *;
do x=$(wc -c < "$FILE"); for ((i=1; i <= $x; i));
do sed "s/^.\{$i\}//" $FILE > $i;
done;
done;
for example I have this xml file in the directory xml/root.xml
<ticket id="usa-001" REFUND="NO" TEST="TEST">
<airline>Us Airlines</airline>
<emptytag id="usa-001" REFUND="NO" TEST="TEST"/>
<preis>30</preis><seat>
<allseats>120</allseats>
</ticket>
What I want is deleting the first N chars and saving it into a new file. lets say 5 so it would be
et id="usa-001" REFUND="NO" TEST="TEST">
<airline>Us Airlines</airline>
<emptytag id="usa-001" REFUND="NO" TEST="TEST"/>
<preis>30</preis><seat>
<allseats>120</allseats>
</ticket>
CodePudding user response:
Using GNU sed:
$ sed -Ez 's/^.{5}//' root.xml > 5
$ cat 5
et id="usa-001" REFUND="NO" TEST="TEST">
<airline>Us Airlines</airline>
<emptytag id="usa-001" REFUND="NO" TEST="TEST"/>
<preis>30</preis><seat>
<allseats>120</allseats>
</ticket>
if you want to remove up to 5 chars in files that have less than 5 then use {1,5}
instead of {5}
.
CodePudding user response:
With your shown samples please try following awk
code. Written and tested in GNU awk
.
For single Input_file:
awk -i inplace -v RS='^.{5}' -v ORS='' 'END{print}' Input_file
For multiple Input_file(s) with GNU awk
: Using ENDFILE
function here which will process all lines at the end of each Input_file as names suggests.
awk -i inplace -v RS='^.{5}' -v ORS='' 'ENDFILE{print}' *
CodePudding user response:
If you really just want to filter out the first n characters of a file, the tool you want is dd
which allows you to specify the number of blocks to skip. If you want a block size of 1, specify that with bs
. For example, to skip the first 2 characters of the input file, use:
$ echo foobarbaz | dd bs=1 skip=2 2> /dev/null
obarbaz
You can specify an input file with if
, but it's probably simpler to redirect. dd
writes a bunch of diagnostics to stderr, and the output redirection is just there to suppress those messages. This will be slow as dirt since the block size is so small, but (if you have a dd which supports this) you can be much faster than sed
with:
dd iflag=skip_bytes skip=5
CodePudding user response:
You can also use tail
:
# display from 4th byte
# in other words, remove first 3 bytes
$ printf 'apple\nbanana\nfig\ncherry\n' | tail -c 4
le
banana
fig
cherry
CodePudding user response:
With cut
n=5; cut -c$n- file.txt
It looks like you want to save each line in a file.
n=5; cut -c$n- file.txt | awk '{print $0 > NR}'
n=5; cut -c$n- file.txt | awk '{print $0 > NR; exit}'
CodePudding user response:
You know, you can also use hexdump
:
hexdump -s 5 -ve '/1 "%c"' inputfile > outfile
CodePudding user response:
You could do something hacky and ugly like this -
awk 'BEGIN{ left=100 } { if (left>0) { len=length($0); if (len<left) { left-=len 1; next } else { print substr($0,left); len=0; next } } else print $0 }' infile
Don't, please... Use Ed's sed
instead.
You could use Perl
-
perl -e 'seek(STDIN,100,0) && print <>' < infile # simpler
perl -e '$/=undef; open(my $fh,$ARGV[0]); seek($fh,100,0) && print <$fh>' infile # cleaner
but William's dd
works on binaries without requiring any code...
dd bs=1 skip=100 < infile > outfile
and Sundeep's is probably most on target for text files if your version understands the
option -
tail -c 101 infile # start at byte 101, having skipped the first 100