Home > other >  extracting block of files from given two files
extracting block of files from given two files

Time:08-17

I have two files as follows

cat file1.txt

256
258

cat file2.txt

2.1 56.8 85.1 256
2.2 56.8 85.2 256
2.3 56.8 85.3 256
2.4 56.8 85.4 258
2.5 56.8 85.5 258

I want to extract the block of file from file2.txt when the 4th column values of file2.txt matches with the value of file1.txt. I also want to skip blank lines from file1.txt in output.

Expected output, where it should create output files in following format.

first_block

2.1 56.8 85.1 256
2.2 56.8 85.2 256
2.3 56.8 85.3 256

second_block

2.4 56.8 85.4 258
2.5 56.8 85.5 258

My tried code is:

#!/bin/sh

for file in `cat file1.txt' `
do
    ndt=$file
    echo $ndt
    for #here unable to proceed
done

But I am unable to match and extract the block of file. Any help is highly appreciated.

CodePudding user response:

With your shown samples please try following awk code.

awk '
FNR==NR{
  arr[$0]
  next
}
!NF{ next }
prev!=$NF{
  close(outputFile)
  outputFile=(  count)"_block.txt"
}
{
  print > (outputFile)
  prev=$NF
}
' file1 <(sort -k4 -n file2)

NOTE: In case you have GNU sort and you want to keep the order in which lines occurring then change sort command to sort -s -k4 -n file2

Explanation: Adding detailed explanation for above code.

awk '                             ##Starting awk program from here.
FNR==NR{                          ##Checking condition which will be TRUE when file1 is being read.
  arr[$0]                         ##Creating array arr with index of current line.
  next                            ##next will skip all further statements from here.
}
!NF{ next }
prev!=$NF{                        ##If prev is NOT equal to $NF value then do following.
  close(outputFile)               ##Closing output file here in backend.
  outputFile=(  count)"_block.txt" ##Creating outputFile variable with value of   count followed by string _block.txt
}
{
  print > (outputFile)            ##printing current line into output file whose variable name is outputFile.
  prev=$NF                        ##Setting $NF value to prev.
}
' file1 <(sort -k4 -n file2)      ##mentioning file1 and file2 data as input.
  • Related