Using Java implementation from the saved 10 billion integer file (each an integer), find out the first 100 minimum number,
CodePudding user response:
10 billion, assuming a line only 1 number (1 byte), also need 10 * 1000 * 1000 * 1000 of about 10 gb of memory, so read such as memory to sort all estimates it will
So can only use the sort, the multiplex merge, find the first 100 minimum,
Namely to read in the first part of the file data, sorted, and output to the external small temporary files, the last read in these small temporary files, from a little file which is sorted the minimum data, removed the minimum data file pointer continues to move back, is not access the file pointer data remain in the original position, until 100, and then delete temporary files,
CodePudding user response:
reference 1st floor qybao response: 10 billion, assuming a line only a number (1 byte), also need 10 * 1000 * 1000 * 1000 of about 10 gb of memory, so read such as memory to sort all estimates it will So can only use the sort, the multiplex merge, find the first 100 minimum, Namely to read in the first part of the file data, sorted, and output to the external small temporary files, the last read in these small temporary files, from sorted a little file which is the minimum data, removed the smallest data of the file pointer move to continue in the future, is not access the file pointer data remain in the original position, until 100, and then delete temporary files, partition method CodePudding user response:
1. According to the situation of machine memory set a process data storage size 2. Select the first 100 data as the initial minimum 3. According to the growing up of the 100 data for ranking 4. Take out one-time comparative data from a data, in 100 the default data from forward after the comparison, elected in comparative data of the data content than the current small, than position on a comparative data, insert data, behind the data, in turn, postpone, lay down the last data 5. The rest is a question of time and energy, 100 minimum data resulting from the left CodePudding user response:
If ordering as the main focus, merge sort is the main algorithm CodePudding user response:
refer to the second floor qq_33283446 response: partition method? Wrong and you it's just looking for 100 minimum, so don't order, only need to traverse the file before find 100 minimum, get a int [100] of the min array, each read one element of an array of data with min, if smaller than min element is inserted the read data array Such as the former 3 minimum number of different number int array []={5, 8, 6, 7, 9, 4, 1, 3, 2, 0};//to find the minimum 3 different number Int min []={0, 0, 0};// Int I=0, j=0; For (int k=0; K For (I=2; I>=0 & amp; & Min [I] & gt; Array [k]. I -);//compare array [k] and min array If (i<0) {//if all smaller than min array, insert the first position, move the rest of the array back For (j=2; J> 0; J - min) [j]=min [1]; Min [0]=array [k]. } else if (I.=2 & amp; & Min [I]!=array [k]) {//if smaller than a certain position, and does not exist in the array, inserting the location, the location element moving back behind the For (j=2; J> I + 1; J - min) [j]=min [1]; Min [I + 1)=a; } } For (I=0; i<3; I++) Printf (" % d ", min [I]); Based on this algorithm traverses the side of the file, only the file is too large, can consider to split files or use multithreading read Give you write a simple example, you can consult public class Sample { Public static void main (String [] args) { Try { The String file="D:/test. TXT"; Int threadCount=Runtime. GetRuntime (). AvailableProcessors ();//available CPU, as the base of multithreading Int [] [] buf=new int [threadCount] [100];//each thread to find the minimum number of 100 (to prevent the exclusive lock lower performance, each thread data domain hands-off) for (int i=0; i The Arrays. The fill (buf [I], Integer. MAX_VALUE);//initialize each thread data domain } CountDownLatch countDown=new CountDownLatch (threadCount);//used in the main program waiting thread, the thread end will reduce 1 //createTestFile (file, 10000);//generated test files Class minThread extends Thread {//Thread class, multi-threaded read file int id=0;//used to assign data domain Long start=0;//read the file start Long end=0;//end of file read Public minThread (int id, long start, long end) { this.id=id; This. Start=start; This. End=end; } Public void the run () { RandomAccessFile raf=null; Try { Raf=new RandomAccessFile (file, the "r"); Raf. Seek (start);//move to file read area While (start & lt; End) {//read from the beginning to the end position, in turn, String s=raf. ReadLine ();//line read If (s==null) break; Start +=s.g etBytes (.) length + 1; If (s.i sEmpty ()) continue; Int n=Integer. The valueOf (s), I=0;//before looking for 100 different number of minimum number For (I=99; I>=0 & amp; & Buf [id] [I] & gt; n; I -); If (i<0 { For (int j=99; J> 0; J -) buf [id] [j]=buf [id] [1]; Buf [id] [0]=n; } else if (I.=99 & amp; & Buf [id] [I]!=n) { For (int j=99; J> I + 1; J -) buf [id] [j]=buf [id] [1]; Buf [id] [I + 1)=n; } } CountDown, countDown ();//thread end minus 1 } the catch (Throwable e) { e.printStackTrace(); } the finally { If (raf!=null) { Try { Raf. Close (); {} the catch (Throwable ee) Ee. PrintStackTrace (); } } } } } RandomAccessFile raf=new RandomAccessFile (file, the "r"); Long totalLength=raf. Length (); The total length//file Long offset=totalLength/threadCount; Long start=0, end=0; For (int I=0, j=0; i Raf. Seek + offset (start);//file read area End=0; While ((j=raf read ())!=1 & amp; & J! )='\ n' { End++; } End +=(start + offset + 1); New minThread (I, start, end). The start ();//distribution of reading area to thread Start=end + 1; } End=totalLength; New minThread (threadCount - 1, the start and end). The start (); Raf. Close (); CountDown. Await ();//waiting thread end Int [] min=new int [100].//in the end the results Int [] cur=new int [threadCount], independence idx=new int [threadCount]; The Arrays. The fill (cur, 0); for (int i=0; i<100; I++) {//from the results of each thread before 100 minimum number Int TMP=buf [0] [cur [0]].//get one number The Arrays. The fill (independence idx, 0); Independence idx [0]=1;//independence idx to 1 of the selected data, select the current take the number of For (int j=1; j If (tmp> Buf [j] [cur [j]]) {//if a thread is smaller, the result of the number is to record the selected TMP=buf [j] [cur [j]]. The Arrays. The fill (independence idx, 0); Independence idx [j]=1; } else if (TMP==buf [j] [cur [j]]) {//in order to remove duplicate data, equal also said selected Independence idx [j]=1; } } Min [I]=TMP;//save the smallest number For (int j=0; j nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull