Home > database >  Separate lines with keys and store in different files
Separate lines with keys and store in different files

Time:05-19

How to separate (get) the entire line related to hexadecimal number keys and the entire line for DEBUG in a text file, then store in different file, where the key is in this format: "[ uid key]"? i.e. ignore any lines that is not DEBUG.

in.txt:

  [ uid 28fd4583833] DEBUG web.Action
  [ uid 39fd5697944] DEBUG test.Action
  [ uid 56866969445] DEBUG test2.Action
  [ uid 76696944556] INFO  test4.Action
  [ uid 39fd5697944] DEBUG test7.Action
  [ uid 85483e10256] DEBUG testing.Action

The output files are named as "out" i ".txt", where i = 1, 2, 3, 4. i.e.

out1.txt:

  [ uid 28fd4583833] DEBUG web.Action

out2.txt:

  [ uid 39fd5697944] DEBUG test.Action
  [ uid 39fd5697944] DEBUG test7.Action

out3.txt:

  [ uid 56866969445] DEBUG test2.Action

out4.txt:

  [ uid 85483e10256] DEBUG testing.Action

I tried:

awk 'match($0, /uid ([^]] )/, a) && /DEBUG/ {print > (a[1] ".txt")}' in.txt

CodePudding user response:

If you are willing to change the output file names to include the keys (frankly, this seems more useful that a one-up counter in the names), you can do:

awk '/DEBUG/{print > ("out-" $3 ".txt")}' FS='[][ ]*'  in.txt

This will put all lines that match the string DEBUG with key 85483e10256 into the file out-85483e10256.txt, etc.

If you do want to keep the one-up counter, you could do:

 awk '/DEBUG/{if( ! a[$3] ) a[$3] =   counter;
     print > ("out" a[$3] ".txt")}' FS='[][ ]*'  in.txt

Basically, the idea is to use the regex [][ ]* as the field separator, which matches a string of square brackets or spaces. This way, $1 is the text preceding the initial [, $2 is the string uid, and $3 is the key. This will (should!) correctly get the key for lines that might have slightly different white space. We use an associative array to keep track of which keys have already been seen to keep track of the counter. But it really is cleaner just to use the key in the output file name.

CodePudding user response:

If your file format is consistent as you show, you can just do:

awk '
    $4!="DEBUG" { next }
    !f[$3] { f[$3]=  i }
    { print > ("out" f[$3] ".txt") }
' in.txt

CodePudding user response:

1st solution: Using GNU awk try following single awk code. Where I am using PROCINFO["sorted_in"] method of GNU awk.

awk '
BEGIN{
  PROCINFO["sorted_in"] = "@ind_num_asc"
}
!/DEBUG/{ next }
match($0,/uid [a-zA-Z0-9] /){
  ind=substr($0,RSTART,RLENGTH)
  arr[ind]=(arr[ind]?arr[ind] ORS:"") $0
}
END{
  for(i in arr){
    outputFile=("out"  count".txt")
    print arr[i] > (outputFile)
    close(outputFile)
  }
}
'  Input_file


2nd solution: with any awk, with your shown samples please try following solution. Change Input_file name with your actual file's name here.

awk '
!/DEBUG/{ next }
match($0,/uid [0-9a-zA-Z] /){
  print substr($0,RSTART,RLENGTH)";"$0
}' Input_file  | 
sort -k2n      | 
cut -d';' -f2- | 
awk '
match($0,/uid [0-9a-zA-Z] /){
  if(prev!=substr($0,RSTART,RLENGTH)){
    count  
    close(outputFile)
  }
  outputFile="out"count".txt"
  print > (outputFile)
  prev=substr($0,RSTART,RLENGTH)
}
'

1st solution's Explanation: Adding detailed explanation for 1st solution:

awk '                                       ##Starting awk program from here.
BEGIN{                                      ##Starting BEGIN section from here.
  PROCINFO["sorted_in"] = "@ind_num_asc"    ##Setting PROCINFO["sorted_in"] to @ind_num_asc to sort any array with index.
}
!/DEBUG/{ next }                            ##If a line does not contain DEBUG then jump to next line.
match($0,/uid [a-zA-Z0-9] /){               ##using match function to match uid space and alphanumeric values here.
  ind=substr($0,RSTART,RLENGTH)             ##Creating ind which contains sub string of matched sub string in match function.
  arr[ind]=(arr[ind]?arr[ind] ORS:"") $0    ##Creating array arr with index of ind and keep adding current line value to same index.
}
END{                                        ##Starting END block of this program from here.
  for(i in arr){                            ##Traversing through array arr here.
    outputFile=("out"  count".txt")         ##Creating output file name here as per OP requirement.
    print arr[i] > (outputFile)             ##printing current array element into outputFile variable.
    close(outputFile)                       ##Closing output file in backend to avoid too many files opened error.
  }
}
'  Input_file                               ##Mentioning Input_file name here.
  • Related