Home > Net >  How to use shell to print every line containing the first col in the file where the number of occurr
How to use shell to print every line containing the first col in the file where the number of occurr

Time:06-20

Source File

$cat log.txt

key1 1654684897 1 3 d d
key1 1654684897 1 3 d 2038
key1 1654684997 1 3 c c
key1 1654684997 1 3 c 2038
key1 1654684997 1 3 c 2071
key2 1654684897 1 3 d d
key2 1654684897 1 3 d 2039
key3 1654684997 1 3 c c
key3 1654684997 1 3 c 2038
key3 1654684997 1 3 c 2071

my solution:

$cat log.txt|awk '{print $1}' | sort | uniq -c |awk '$1>2{print $2} |xargs -I{} grep -E {} log.txt

Output

key1 1654684897 1 3 d d
key1 1654684897 1 3 d 2038
key1 1654684997 1 3 c c
key1 1654684997 1 3 c 2038
key1 1654684997 1 3 c 2071
key3 1654684997 1 3 c c
key3 1654684997 1 3 c 2038
key3 1654684997 1 3 c 2071

Because my log file is very large, this method is too time-consuming, is there a faster method?

CodePudding user response:

Make two passes over the file in awk, one to count keys, one to print lines that qualify:

awk 'NR == FNR { keys[$1]  ; next }
     keys[$1] > 2' log.txt log.txt

CodePudding user response:

I ended up using this:

awk '{ORS=","}NR == FNR { keys[$1]  ; next } 
     keys[$1] >= 100 &&   a[$1] <= 100 
     {if (a[$1]==1){print $1;  for(i=3;i<=NF;  i) print  $i}
     else if (a[$1]<100) {for(i=3;i<=NF;  i) print  $i } 
     else {for(i=3;i<=NF;  i) print  $i; print "\n" } }' 
     log.txt log.txt | sed 's/^,//' >>result

log.txt has 137,002,531 rows, and result has 190,992 rows.

It only took 2 minutes and 56 seconds

  • Related