How to print lines where it matchs specific number in bash?-CodePudding

I have a text file called file00.txt like below,

ABC_0000  AA
CDE_0000  BB
EFG_0000  CC
ABC_0001  DD
CDE_0001  EE
EFG_0001  FF

where it should separated into two different files, like file1.txt

ABC_0000  AA
CDE_0000  BB
EFG_0000  CC

and file2.txt

ABC_0001  DD
CDE_0001  EE
EFG_0001  FF

what i have been trying, cat file00.txt | awk '{print $1}' | sed 's/.*$....$/\1/') to get the only numbers from first word but I am not able use this to go forward separating it into the two files.

Any help is much appreciated.

CodePudding user response：

If you only need to do this for 2 numbers, you can use

cat file00.txt | grep "_0000" >> file1.txt
cat file00.txt | grep "_0001" >> file2.txt

CodePudding user response：

1st solution: Considering your entries are sorted with values of 2nd column(0000, 00001 and so on). With your shown samples, please try following awk program.

awk -v count="1" -F'_|  ' '
prev!=$2{
  count  
  close(outputFile)
  outputFile=("file"count".txt")
  prev=$2
}
{
  print > (outputFile)
}
'  Input_file

2nd solution: Using sort awk combination solution in case entries are not sorted.

awk -F'_|  ' '{print $2,$0}' Input_file | 
sort -nk1                               | 
awk -v count="1" -F'_|  ' '
{
  sub(/^[^[:space:]] [[:space:]] /,"")
}
prev!=$2{
  count  
  close(outputFile)
  outputFile=("file"count".txt")
  prev=$2
}
{
  print > (outputFile)
}
'

CodePudding user response：

The typical way to do this with awk is something like:

awk '{outfile = sprintf("file%d.txt", $2   1); print > outfile}' FS='[_ ]' input

The manner in which you parse out the relevant number to use will change with the input format. Also, as the input file grows larger you may to worry about running out of resources, so you might want to close the files explictly with something like:

awk '{outfile = sprintf("file%d.txt", $2   1); print >> outfile; close(outfile)}' FS='[_ ]' input

Which requires you to add some additional logic to ensure that the files are empty or do not exist before you begin.

CodePudding user response：

This might work for you (GNU uniq and csplit):

uniq -s4 -w4 --group file | csplit --supp -f file -b 'd.txt' - '/^$/' '{*}'

Use uniq to separate each group of lines by an empty line, where groups are determined by skipping the first 4 characters and matching on the following 4.

Pass the output from the above into csplit.

Suppress the matching lines (blank lines) and split the groups into files named file and suffix of d.txt where is replaced by 00, 01 ....