Home > Net >  Optimised EmEditor macro to populate column based on another column for a large file
Optimised EmEditor macro to populate column based on another column for a large file

Time:05-27

I’ve got a really large file, circa 10m rows, in which I’m trying to populate a column based on conditions on another column via a jsee macro. While it is quite quick for small files, it does take some time for the large file.

//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
     IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
          THEN cell in Col2 = previous cell in Col2

//jsee code
document.CellMode = true;   // Must be cell selection mode
totalLines = document.GetLines();
    
for( i = 1; i < totalLines; i   ) {

     nref = document.GetCell( i, 1, eeCellIncludeNone );
     gsize = document.GetCell( i, 2, eeCellIncludeNone );

     if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
          document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
      }
 }

Input File:

Reference Group Size
14/12/01819 1
14/12/01820 1
15/01/00191 4
15/01/00191
15/01/00191
15/01/00198
15/01/00292 3
15/01/00292
15/01/00292
15/01/00401 5
15/01/00401
15/01/00402
1
15/01/00403 2
15/01/00403
15/01/00403
15/01/00403
15/01/00404
20/01/01400 1

Output File:

Reference Group Size
14/12/01819 1
14/12/01820 1
15/01/00191 4
15/01/00191 4
15/01/00191 4
15/01/00198
15/01/00292 3
15/01/00292 3
15/01/00292 3
15/01/00401 5
15/01/00401 5
15/01/00402
1
15/01/00403 2
15/01/00403 2
15/01/00403 2
15/01/00403 2
15/01/00404
20/01/01400 1

Any ideas on how to optimise this and make it run even faster?

CodePudding user response:

I wrote a JavaScript for EmEditor macro for you. You might need to set the correct numbers in the first 2 lines for iColReference and iColGroupSize.

iColReference = 1;   // the column index of "Reference"
iColGroupSize = 2;   // the column index of "Group Size"
document.CellMode = true;   // Must be cell selection mode
sDelimiter = document.Csv.Delimiter;  // retrieve the delimiter
nOldHeadingLines = document.HeadingLines;  // retrieve old headings
document.HeadingLines = 0;   // set No Headings
yBottom = document.GetLines();   // retrieve the number of lines
if( document.GetLine( yBottom ).length == 0 ) {  // -1 if the last line is empty
    --yBottom;
}
str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom );  // get whole 1st column from top to bottom, separated by TAB
sCol1 = str.split( sDelimiter );
str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom );  // get whole 2nd column from top to bottom, separated by TAB
sCol2 = str.split( sDelimiter );

s1 = "";
s2 = "";
for( i = 0; i < yBottom;   i ) {  // loop through all lines
    if( sCol2[i].length != 0 ) {
        s1 = sCol1[i];
        s2 = sCol2[i];
    }
    else {
        if( s1.length != 0 && sCol1[i] == s1 ) {  // same value as previous line, copy s2
            if( s2.length != 0 ) {
                sCol2[i] = s2;
            }
        }
        else {  // different value, empty s1 and s2
            s1 = "";
            s2 = "";
        }
    }
}

str = sCol2.join( sDelimiter );
document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote );  // set whole 2nd column from top to bottom with the new values
document.HeadingLines = nOldHeadingLines;   // restore the original number of headings

To run this, save this code as, for instance, Macro.jsee, and then select this file from Select... in the Macros menu. Finally, select Run Macro.jsee in the Macros menu.

  • Related