I’ve got a really large file, circa 10m rows, in which I’m trying to populate a column based on conditions on another column via a jsee macro. While it is quite quick for small files, it does take some time for the large file.
//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
THEN cell in Col2 = previous cell in Col2
//jsee code
document.CellMode = true; // Must be cell selection mode
totalLines = document.GetLines();
for( i = 1; i < totalLines; i ) {
nref = document.GetCell( i, 1, eeCellIncludeNone );
gsize = document.GetCell( i, 2, eeCellIncludeNone );
if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
}
}
Input File:
Reference | Group Size |
---|---|
14/12/01819 | 1 |
14/12/01820 | 1 |
15/01/00191 | 4 |
15/01/00191 | |
15/01/00191 | |
15/01/00198 | |
15/01/00292 | 3 |
15/01/00292 | |
15/01/00292 | |
15/01/00401 | 5 |
15/01/00401 | |
15/01/00402 | |
1 | |
15/01/00403 | 2 |
15/01/00403 | |
15/01/00403 | |
15/01/00403 | |
15/01/00404 | |
20/01/01400 | 1 |
Output File:
Reference | Group Size |
---|---|
14/12/01819 | 1 |
14/12/01820 | 1 |
15/01/00191 | 4 |
15/01/00191 | 4 |
15/01/00191 | 4 |
15/01/00198 | |
15/01/00292 | 3 |
15/01/00292 | 3 |
15/01/00292 | 3 |
15/01/00401 | 5 |
15/01/00401 | 5 |
15/01/00402 | |
1 | |
15/01/00403 | 2 |
15/01/00403 | 2 |
15/01/00403 | 2 |
15/01/00403 | 2 |
15/01/00404 | |
20/01/01400 | 1 |
Any ideas on how to optimise this and make it run even faster?
CodePudding user response:
I wrote a JavaScript for EmEditor macro for you. You might need to set the correct numbers in the first 2 lines for iColReference and iColGroupSize.
iColReference = 1; // the column index of "Reference"
iColGroupSize = 2; // the column index of "Group Size"
document.CellMode = true; // Must be cell selection mode
sDelimiter = document.Csv.Delimiter; // retrieve the delimiter
nOldHeadingLines = document.HeadingLines; // retrieve old headings
document.HeadingLines = 0; // set No Headings
yBottom = document.GetLines(); // retrieve the number of lines
if( document.GetLine( yBottom ).length == 0 ) { // -1 if the last line is empty
--yBottom;
}
str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 1st column from top to bottom, separated by TAB
sCol1 = str.split( sDelimiter );
str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 2nd column from top to bottom, separated by TAB
sCol2 = str.split( sDelimiter );
s1 = "";
s2 = "";
for( i = 0; i < yBottom; i ) { // loop through all lines
if( sCol2[i].length != 0 ) {
s1 = sCol1[i];
s2 = sCol2[i];
}
else {
if( s1.length != 0 && sCol1[i] == s1 ) { // same value as previous line, copy s2
if( s2.length != 0 ) {
sCol2[i] = s2;
}
}
else { // different value, empty s1 and s2
s1 = "";
s2 = "";
}
}
}
str = sCol2.join( sDelimiter );
document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote ); // set whole 2nd column from top to bottom with the new values
document.HeadingLines = nOldHeadingLines; // restore the original number of headings
To run this, save this code as, for instance, Macro.jsee
, and then select this file from Select... in the Macros menu. Finally, select Run Macro.jsee in the Macros menu.