I have a text file called file00.txt like below,
ABC_0000 AA
CDE_0000 BB
EFG_0000 CC
ABC_0001 DD
CDE_0001 EE
EFG_0001 FF
where it should separated into two different files, like file1.txt
ABC_0000 AA
CDE_0000 BB
EFG_0000 CC
and file2.txt
ABC_0001 DD
CDE_0001 EE
EFG_0001 FF
what i have been trying,
cat file00.txt | awk '{print $1}' | sed 's/.*\(....\)/\1/')
to get the only numbers from first word but I am not able use this to go forward separating it into the two files.
Any help is much appreciated.
CodePudding user response:
If you only need to do this for 2 numbers, you can use
cat file00.txt | grep "_0000" >> file1.txt
cat file00.txt | grep "_0001" >> file2.txt
CodePudding user response:
1st solution: Considering your entries are sorted with values of 2nd column(0000, 00001 and so on). With your shown samples, please try following awk
program.
awk -v count="1" -F'_| ' '
prev!=$2{
count
close(outputFile)
outputFile=("file"count".txt")
prev=$2
}
{
print > (outputFile)
}
' Input_file
2nd solution: Using sort
awk
combination solution in case entries are not sorted.
awk -F'_| ' '{print $2,$0}' Input_file |
sort -nk1 |
awk -v count="1" -F'_| ' '
{
sub(/^[^[:space:]] [[:space:]] /,"")
}
prev!=$2{
count
close(outputFile)
outputFile=("file"count".txt")
prev=$2
}
{
print > (outputFile)
}
'
CodePudding user response:
The typical way to do this with awk
is something like:
awk '{outfile = sprintf("file%d.txt", $2 1); print > outfile}' FS='[_ ]' input
The manner in which you parse out the relevant number to use will change with the input format. Also, as the input file grows larger you may to worry about running out of resources, so you might want to close the files explictly with something like:
awk '{outfile = sprintf("file%d.txt", $2 1); print >> outfile; close(outfile)}' FS='[_ ]' input
Which requires you to add some additional logic to ensure that the files are empty or do not exist before you begin.
CodePudding user response:
This might work for you (GNU uniq and csplit):
uniq -s4 -w4 --group file | csplit --supp -f file -b 'd.txt' - '/^$/' '{*}'
Use uniq to separate each group of lines by an empty line, where groups are determined by skipping the first 4 characters and matching on the following 4.
Pass the output from the above into csplit.
Suppress the matching lines (blank lines) and split the groups into files named file
and suffix of d.txt
where is replaced by
00
, 01
...
.