Home > Software engineering >  Quick sort?
Quick sort?

Time:10-06

A more than 300 m 12 columns in the CSV file, containing about 20 million records, how to remove the first 10 million lines and sorted according to the second column, save to a new CSV file?

CodePudding user response:

Use excel, if you don't write programs, of course, you have enough money to hire programmers to develop another matter for you,

CodePudding user response:

Large file reading itself is a not easy thing, big data sorting and challenge your ability

CodePudding user response:

reference 1st floor caozhy response:
use excel, if you don't write programs, of course, you have enough money to hire programmers to develop another matter for you,


Excel can handle only 1 million lines, slow death

CodePudding user response:

Ok you this big file is only more than 300 m, in the phone memory has now even parity 1 g era, one-time read into memory again sorted all do not have what problem, it's nothing good to discuss, according to the basic concept of data structure and algorithm of sorting just do it,

CodePudding user response:

For reference only, although C:
//to sort the contents of the file 1 and heavy, as a result, save to file 2 
#include
#include
#include
128//# define MAXCHARS can deal with the biggest line width, including the end-of-line \ n and string tail \ 0
Int MAXLINES=10000, MAXLINES2;
Buf, char * * buf2;
Int c, n, hh, I, L;
The FILE * f;
Char ln [MAXCHARS];
Int ignore_case=0;
Int icompare (const void * arg1, const void * arg2) {
Return stricmp ((char *) arg1, arg2 (char *));
}
Int the compare (const void * arg1, const void * arg2) {
Return STRCMP ((char *) arg1, arg2 (char *));
}
Int main (int arg c, char * * argv) {
If (argc<3) {
Printf (" Unique line. Designed by [email protected]. The 2012-08-20 \ n ");
Printf (" SRC Usage: % s. TXT uniqued. TXT [-] I \ n ", argv [0]).
return 1;
}
If (argc> 3) ignore_case=1;//if there is a command line parameter 3, ignore case
F=fopen (argv [1], the "r");
If (NULL==f) {
Printf (" Can not find the file % s! \ n ", argv [1]);
return 1;
}
Buf=(char *) malloc (MAXLINES * MAXCHARS);
If (NULL==buf) {
The fclose (f);
Printf (" Can not malloc LINES * (% d % d CHARS)! The \ n ", MAXLINES, MAXCHARS);
Return 2;
}
n=0;
Hh=0;
I=0;
While (1) {
If (NULL==the fgets (ln, MAXCHARS, f)) break;//
Hh++;
L=strlen (ln) - 1;
If (' \ n '!=ln [L]) {//long row behind ignore content
Printf (" % s Line % d long (& gt; % d), spilth ignored. \ n ", argv [1], hh, MAXCHARS);
While (1) {
C=fgetc (f);
If (' \ n '==c | | EOF==c) break;//
}
}
While (1) {//removes white Spaces at the end of each line '\ n' and
If (=='\ n' ln [L] | | '==ln [L]) {
Ln [L]=0;
L -;
If (L<0) break;//
} the else break;//
}
If (L>=0) {
Strcpy (buf + I, ln); I +=MAXCHARS;
n++;
If (n>={MAXLINES)
MAXLINES2=MAXLINES * 2;
If (MAXLINES2==1280000) MAXLINES2=2500000;
Buf2=(char *) realloc (buf, MAXLINES2 * MAXCHARS);
If (NULL==buf2) {
Printf (" Can not malloc LINES * (% d % d CHARS)! The \ n ", MAXLINES2, MAXCHARS);
Printf (" WARNING: Lines & gt; % d ignored. \ n ", MAXLINES);
break;//
}
Buf=buf2;
MAXLINES=MAXLINES2;
}
}
}
The fclose (f);
If (n> 1) {
If (ignore_case) tree (buf, n, MAXCHARS, icompare);
The else tree (buf, n, MAXCHARS, compare);
}
F=fopen (argv [2], "w");
If (NULL==f) {
Free (buf);
Printf (" Can not create the file % s! \ n ", argv [2]);
Return 2;
}
Fprintf (f "% s \ n", buf);
If (n> 1) {
If (ignore_case) {
Hh=0;
L=MAXCHARS;
For (I=1; iBuf if (stricmp (const char *) + hh, buf (const char *) + L)) {
Fprintf (f "% s \ n", buf + L);
}
Hh=L;
L +=MAXCHARS;
}
} else {
Hh=0;
L=MAXCHARS;
For (I=1; iBuf if (STRCMP (const char *) + hh, buf (const char *) + L)) {
Fprintf (f "% s \ n", buf + L);
}
Hh=L;
L +=MAXCHARS;
}
}
}
The fclose (f);
Free (buf);
return 0;
}

CodePudding user response:

http://bbs.csdn.net/topics/340173969

CodePudding user response:

Using ado directly see it as a CSV data sources have to do is call the query sort ~ ~

CodePudding user response:

Handle it sentence by sentence, it is suggested that is inserted into the SQL, sequenced existing as CSV, have no what's vb
If using vb, test your algorithm and programming experience, of course, it is not difficult to oh, just the execution time will grow a little

CodePudding user response:

refer to the eighth floor nanfei01055 response:
sentence by sentence processing, suggested that is inserted into the SQL, sorted into existing CSV, have no what's vb
If using vb, test your algorithm and programming experience, of course, it is not difficult to oh, just the execution time will grow a bit

Why it's says "VB" to deal with, will be slower than you that what SQL???????


But then again, VB6 seems "threshold" is lower,
Not programming level caused a lot of people, must be a few words of VB code,
Eventually make garbage code flying...
  • Related