Listing skipped numbers in a large txt file using bash-CodePudding

I need to find a way to display the missing numbers from a large txt file. It's a web graph that has 875,713 vertices. However, when I sort the file the largest number that is displayed at the end is 916,427. So there are some numbers not being used for vertex index. Is there a bash command I could use to do this?

I found this after searching around some other threads but I'm not entirely sure if its correct:

awk 'NR != $1 { for (i = prev 1; i < $1; i ) {print i} } { prev = $1 1 }' file

CodePudding user response：

If you don't want to store the array in memory (otherwise @jared_mamrot solution would work), you can use

awk 'NR==1 {p=$1; next} {for (i=p 1; i<$1; i  ) {print i}; p=$1}' < <( sort -n file)

which sorts the file first.

CodePudding user response：

Assuming the 'number' of each vertex is in the first column, you can use:

awk '{a[$1]} END{for(i = 1; i <= 916427; i  ){if(!(i in a)){print i}}}' file

E.g.

# create some example data and remove "10"
seq 916427 | sed '10d' > test.txt

head test.txt
1
2
3
4
5
6
7
8
9
11

awk '{a[$1]} END { for (i = 1; i <= 916427; i  ) { if (!(i in a)) {print i}}}' test.txt
10