Source File
$cat log.txt
key1 1654684897 1 3 d d
key1 1654684897 1 3 d 2038
key1 1654684997 1 3 c c
key1 1654684997 1 3 c 2038
key1 1654684997 1 3 c 2071
key2 1654684897 1 3 d d
key2 1654684897 1 3 d 2039
key3 1654684997 1 3 c c
key3 1654684997 1 3 c 2038
key3 1654684997 1 3 c 2071
my solution:
$cat log.txt|awk '{print $1}' | sort | uniq -c |awk '$1>2{print $2} |xargs -I{} grep -E {} log.txt
Output
key1 1654684897 1 3 d d
key1 1654684897 1 3 d 2038
key1 1654684997 1 3 c c
key1 1654684997 1 3 c 2038
key1 1654684997 1 3 c 2071
key3 1654684997 1 3 c c
key3 1654684997 1 3 c 2038
key3 1654684997 1 3 c 2071
Because my log file is very large, this method is too time-consuming, is there a faster method?
CodePudding user response:
Make two passes over the file in awk, one to count keys, one to print lines that qualify:
awk 'NR == FNR { keys[$1] ; next }
keys[$1] > 2' log.txt log.txt
CodePudding user response:
I ended up using this:
awk '{ORS=","}NR == FNR { keys[$1] ; next }
keys[$1] >= 100 && a[$1] <= 100
{if (a[$1]==1){print $1; for(i=3;i<=NF; i) print $i}
else if (a[$1]<100) {for(i=3;i<=NF; i) print $i }
else {for(i=3;i<=NF; i) print $i; print "\n" } }'
log.txt log.txt | sed 's/^,//' >>result
log.txt has 137,002,531 rows, and result has 190,992 rows.
It only took 2 minutes and 56 seconds