Home > Blockchain >  linux shell get multi file intersection
linux shell get multi file intersection

Time:10-25

I have a few txt file examples 1.txt 2.txt 3.txt 4.txt I want to get 1.txt 2.txt 3.txt 4.txt content intersection

cat 1.txt 2.txt | sort | uniq -c > tmp.txt
cat tmp.txt 3.txt | sort | uniq -c > tmp2.txt 
and so on ....

Is there a better way?

input text
1.txt
1
2
3
4

2.txt
1
2
3

3.txt
1
2

4.txt
1
5

expected output:
1

CodePudding user response:

With your shown samples please try following awk code.

1st solution: This considers that you may have duplicates values of lines with in a single Input_file itself then you may try following:

awk '
!arr2[FILENAME,$0]  {
  arr1[$0]  
}
END{
  for(i in arr1){
    if(arr1[i]==(ARGC-1)){
       print i
    }
  }
}
' *.txt


2nd solution: This solution assumes that there is no duplicates in Input_file if this is the case then try following:

awk '
{
  arr[$0]  
}
END{
  for(i in arr){
    if(arr[i]==(ARGC-1)){
       print i
    }
  }
}
' *.txt

Explanation: Adding detailed explanation for above.

awk '                      ##Starting awk program from here.
{
  arr[$0]                  ##Creating an array named arr with index of $0 and keep increasing its value.
}
END{                       ##Starting END block of this program from here.
  for(i in arr){           ##Traversing through array arr here.
    if(arr[i]==(ARGC-1)){  ##Checking condition if value of current item in arr is Equal to total number of files then print it.
       print i
    }
  }
}
' *.txt                    ##Passing all .txt files as an input to awk program from here.
  • Related