Home > Back-end >  A main method, read the file, and put the data in this file do a sorting, obtain the smallest number
A main method, read the file, and put the data in this file do a sorting, obtain the smallest number

Time:11-01

Using Java implementation from the saved 10 billion integer file (each an integer), find out the first 100 minimum number,

CodePudding user response:

10 billion, assuming a line only 1 number (1 byte), also need 10 * 1000 * 1000 * 1000 of about 10 gb of memory, so read such as memory to sort all estimates it will
So can only use the sort, the multiplex merge, find the first 100 minimum,
Namely to read in the first part of the file data, sorted, and output to the external small temporary files, the last read in these small temporary files, from a little file which is sorted the minimum data, removed the minimum data file pointer continues to move back, is not access the file pointer data remain in the original position, until 100, and then delete temporary files,

CodePudding user response:

reference 1st floor qybao response:
10 billion, assuming a line only a number (1 byte), also need 10 * 1000 * 1000 * 1000 of about 10 gb of memory, so read such as memory to sort all estimates it will
So can only use the sort, the multiplex merge, find the first 100 minimum,
Namely to read in the first part of the file data, sorted, and output to the external small temporary files, the last read in these small temporary files, from sorted a little file which is the minimum data, removed the smallest data of the file pointer move to continue in the future, is not access the file pointer data remain in the original position, until 100, and then delete temporary files,
partition method

CodePudding user response:

1. According to the situation of machine memory set a process data storage size
2. Select the first 100 data as the initial minimum
3. According to the growing up of the 100 data for ranking
4. Take out one-time comparative data from a data, in 100 the default data from forward after the comparison, elected in comparative data of the data content than the current small, than position on a comparative data, insert data, behind the data, in turn, postpone, lay down the last data
5. The rest is a question of time and energy, 100 minimum data resulting from the left

CodePudding user response:

If ordering as the main focus, merge sort is the main algorithm

CodePudding user response:

refer to the second floor qq_33283446 response:
partition method?

Wrong and you it's just looking for 100 minimum, so don't order, only need to traverse the file before find 100 minimum, get a int [100] of the min array, each read one element of an array of data with min, if smaller than min element is inserted the read data array
Such as the former 3 minimum number of different number
 int array []={5, 8, 6, 7, 9, 4, 1, 3, 2, 0};//to find the minimum 3 different number 
Int min []={0, 0, 0};//
Int I=0, j=0;
For (int k=0; KFor (I=2; I>=0 & amp; & Min [I] & gt; Array [k]. I -);//compare array [k] and min array
If (i<0) {//if all smaller than min array, insert the first position, move the rest of the array back
For (j=2; J> 0; J - min) [j]=min [1];
Min [0]=array [k].
} else if (I.=2 & amp; & Min [I]!=array [k]) {//if smaller than a certain position, and does not exist in the array, inserting the location, the location element moving back behind the
For (j=2; J> I + 1; J - min) [j]=min [1];
Min [I + 1)=a;
}
}
For (I=0; i<3; I++)
Printf (" % d ", min [I]);


Based on this algorithm traverses the side of the file, only the file is too large, can consider to split files or use multithreading read
Give you write a simple example, you can consult

 public class Sample {
Public static void main (String [] args) {
Try {
The String file="D:/test. TXT";
Int threadCount=Runtime. GetRuntime (). AvailableProcessors ();//available CPU, as the base of multithreading
Int [] [] buf=new int [threadCount] [100];//each thread to find the minimum number of 100 (to prevent the exclusive lock lower performance, each thread data domain hands-off)
for (int i=0; iThe Arrays. The fill (buf [I], Integer. MAX_VALUE);//initialize each thread data domain
}
CountDownLatch countDown=new CountDownLatch (threadCount);//used in the main program waiting thread, the thread end will reduce 1
//createTestFile (file, 10000);//generated test files
Class minThread extends Thread {//Thread class, multi-threaded read file
int id=0;//used to assign data domain
Long start=0;//read the file start
Long end=0;//end of file read
Public minThread (int id, long start, long end) {
this.id=id;
This. Start=start;
This. End=end;
}
Public void the run () {
RandomAccessFile raf=null;
Try {
Raf=new RandomAccessFile (file, the "r");
Raf. Seek (start);//move to file read area
While (start & lt; End) {//read from the beginning to the end position, in turn,
String s=raf. ReadLine ();//line read
If (s==null) break;
Start +=s.g etBytes (.) length + 1;
If (s.i sEmpty ()) continue;

Int n=Integer. The valueOf (s), I=0;//before looking for 100 different number of minimum number
For (I=99; I>=0 & amp; & Buf [id] [I] & gt; n; I -);
If (i<0 {
For (int j=99; J> 0; J -) buf [id] [j]=buf [id] [1];
Buf [id] [0]=n;
} else if (I.=99 & amp; & Buf [id] [I]!=n) {
For (int j=99; J> I + 1; J -) buf [id] [j]=buf [id] [1];
Buf [id] [I + 1)=n;
}
}
CountDown, countDown ();//thread end minus 1
} the catch (Throwable e) {
e.printStackTrace();
} the finally {
If (raf!=null) {
Try {
Raf. Close ();
{} the catch (Throwable ee)
Ee. PrintStackTrace ();
}
}
}
}
}
RandomAccessFile raf=new RandomAccessFile (file, the "r");
Long totalLength=raf. Length (); The total length//file
Long offset=totalLength/threadCount;
Long start=0, end=0;
For (int I=0, j=0; iRaf. Seek + offset (start);//file read area
End=0;
While ((j=raf read ())!=1 & amp; & J! )='\ n' {
End++;
}
End +=(start + offset + 1);
New minThread (I, start, end). The start ();//distribution of reading area to thread
Start=end + 1;
}
End=totalLength;
New minThread (threadCount - 1, the start and end). The start ();
Raf. Close ();

CountDown. Await ();//waiting thread end
Int [] min=new int [100].//in the end the results
Int [] cur=new int [threadCount], independence idx=new int [threadCount];
The Arrays. The fill (cur, 0);
for (int i=0; i<100; I++) {//from the results of each thread before 100 minimum number
Int TMP=buf [0] [cur [0]].//get one number
The Arrays. The fill (independence idx, 0);
Independence idx [0]=1;//independence idx to 1 of the selected data, select the current take the number of
For (int j=1; jIf (tmp> Buf [j] [cur [j]]) {//if a thread is smaller, the result of the number is to record the selected
TMP=buf [j] [cur [j]].
The Arrays. The fill (independence idx, 0);
Independence idx [j]=1;
} else if (TMP==buf [j] [cur [j]]) {//in order to remove duplicate data, equal also said selected
Independence idx [j]=1;
}
}
Min [I]=TMP;//save the smallest number
For (int j=0; jnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
  • Related