Improve performance of loading 100,000 records from database-CodePudding

We created a program to make the use of the database easier in other programs. So the code im showing gets used in multiple other programs.

One of those other programs gets about 10,000 records from one of our clients and has to check if these are in our database already. If not we insert them into the database (they can also change and have to be updated then).

To make this easy we load all the entries from our whole table (at the moment 120,000), create a class for every entry we get and put all of them into a Hashmap.

The loading of the whole table this way takes around 5 minutes. Also we sometimes have to restart the program because we run into a GC overhead error because we work on limited hardware. Do you have an idea of how we can improve the performance?

Here is the code to load all entries (we have a global limit of 10.000 entries per query so we use a loop):

public Map<String, IMasterDataSet> getAllInformationObjects(ISession session) throws MasterDataException {
    IQueryExpression qe;
    IQueryParameter qp;
    
    // our main SDP class
    Constructor<?> constructorForSDPbaseClass = getStandardConstructor();
    
    SimpleDateFormat itaTimestampFormat = new SimpleDateFormat("yyyyMMddHHmmssSSS");
    
    // search in standard time range (modification date!)
    Calendar cal = Calendar.getInstance();
    cal.set(2010, Calendar.JANUARY, 1);
    Date startDate = cal.getTime();
    Date endDate = new Date();
    Long startDateL = Long.parseLong(itaTimestampFormat.format(startDate));
    Long endDateL = Long.parseLong(itaTimestampFormat.format(endDate));

    IDescriptor modDesc = IBVRIDescriptor.ModificationDate.getDescriptor(session);

    // count once before to determine initial capacities for hash map/set
    IBVRIArchiveClass SDP_ARCHIVECLASS = getMasterDataPropertyBag().getSDP_ARCHIVECLASS();
    qe = SDP_ARCHIVECLASS.getQueryExpression(session);
    qp = session.getDocumentServer().getClassFactory()
            .getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);        
    qp.setExpression(qe);  
    qp.setHitLimitThreshold(0);
    qp.setHitLimit(0);
    int nrOfHitsTotal = session.getDocumentServer().queryCount(session, qp, "*");
    int initialCapacity = (int) (nrOfHitsTotal / 0.75   1);

    // MD sets; and objects already done (here: document ID)
    HashSet<String> objDone = new HashSet<>(initialCapacity); 
    HashMap<String, IMasterDataSet> objRes = new HashMap<>(initialCapacity); 
    
    qp.close();
    
    // do queries until hit count is smaller than 10.000
    // use modification date
    
    boolean keepGoing = true;
    while(keepGoing) {
        // construct query expression
        // - basic part: Modification date & class type
        // a. doc. class type
        qe = SDP_ARCHIVECLASS.getQueryExpression(session);
        // b. ID
        qe = SearchUtil.appendQueryExpressionWithANDoperator(session, qe, 
                   new PlainExpression(modDesc.getQueryLiteral()   " BETWEEN "   startDateL   " AND "   endDateL));
        
        // 2. Query Parameter: set database; set expression
        qp = session.getDocumentServer().getClassFactory()
                .getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
        
        qp.setExpression(qe);  
        
        // order by modification date; hitlimit = 0 -> no hitlimit, but the usual 10.000 max
        qp.setOrderByExpression(session.getDocumentServer().getClassFactory().getOrderByExpressionInstance(modDesc, true));
        qp.setHitLimitThreshold(0);
        qp.setHitLimit(0);

        // Do not sort by modification date;
        qp.setHints(" NoDefaultOrderBy");
        
        keepGoing = false;
        IInformationObject[] hits = null;
        IDocumentHitList hitList = null;
        hitList = session.getDocumentServer().query(qp, session);
        IDocument doc;
        if (hitList.getTotalHitCount() > 0) {
            hits = hitList.getInformationObjects();
            for (IInformationObject hit : hits) {
                String objID = hit.getID();
                if(!objDone.contains(objID)) {
                    // do something with this object and the class
                    // here: construct a new SDP sub class object and give it back via interface
                    doc = (IDocument) hit;
                    IMasterDataSet mdSet;
                    try {
                        mdSet = (IMasterDataSet) constructorForSDPbaseClass.newInstance(session, doc);
                    } catch (Exception e) {
                        // cause for this
                        String cause = (e.getCause() != null) ? e.getCause().toString() : MasterDataException.ERRMSG_PART_UNKNOWN;                            
                        throw new MasterDataException(MasterDataException.ERRMSG_NOINSTANCE_POSSIBLE, this.getClass().getSimpleName(), e.toString(), cause);
                    }                        
                    objRes.put(mdSet.getID(), mdSet);
                    objDone.add(objID);
                }                       
            }
            doc = (IDocument) hits[hits.length - 1];
            Date lastModDate = ((IDateValue) doc.getDescriptor(modDesc).getValues()[0]).getValue();
            startDateL = Long.parseLong(itaTimestampFormat.format(lastModDate));
        
            keepGoing = (hits.length >= 10000 || hitList.isResultSetTruncated());
        }
        qp.close();
    }   
    return objRes;
}

CodePudding user response：

Loading 120,000 rows (and more) each time will not scale very well, and your solution may not work in the future as the record size grows. Instead let the database server handle the problem.

Your table needs to have a primary key or unique key based on the columns of the records. Iterate through the 10,000 records performing JDBC SQL update to modify all field values with where clause to exactly match primary/unique key.

update BLAH set COL1 = ?, COL2 = ? where PKCOL = ?; // ... AND PKCOL2 =? ...

This modifies an existing row or does nothing at all - and JDBC executeUpate() will return 0 or 1 indicating number of rows changed. If number of rows changed was zero you have detected a new record which does not exist, so perform insert for that new record only.

insert into BLAH (COL1, COL2, ... PKCOL) values (?,?, ..., ?);

You can decide whether to run 10,000 updates followed by however many inserts are needed, or do update optional insert, and remember JDBC batch statements / auto-commit off may help speed things up.