Is there any solution we can retrieve Salesforce data from Python with more than 2000 records for each chunk? I have used REST API to retrieve data and check nextRecordsUrl for the next chunk. But if it is a million records, this solution will take time. I tried to find a Salesforce parameter to increase the number of records for each chunk (>2000 records) but didn't find it yet.
Another idea is if we know how many nextRecordsUrl, we can use multi-threading in Python to retrieve data. But it seems we need to submit each nextRecordsUrl to get the next one.
If you have others ideas, pls suggest them. Currently, I can't use some filter conditions in SQL to limit data.
CodePudding user response:
You could look into using Bulk API query, it'd let you return data in 10K chunks. But it comes with a bit of thinking shift. Your normal API is synchronous (give me next chunk, wait, give me next chunk, wait). With Bulk API you submit the job and from time to time you ask "is it done yet".
There's even a feature called "PK chunking" (split the results by primary key)
Consider going through the trailhead: https://trailhead.salesforce.com/content/learn/modules/large-data-volumes
And maybe play with Salesforce's Data Loader. Query your stuff normal way and measure the time, then with bulk api option selected. should give you idea what's the bottleneck and whether the big rewrite will gain anything.