I have an elasticsearch database contain about 100k rows. I want paginate about 30k rows.
The error that I get is about max-result-window.
In this case I cannot get records from 23000 to 23004 because is exceeds 10k records. Is there a workaround?
CodePudding user response:
A possible workaround that I find is to use scroll api. In pratice I scroll by size 20 (1 page), until I achieve page 51711. It takes about 10 minutes because It scroll all data before achieve start record 1070100 to record 1070120.
url = "http://localhost:9200"
index = "civile"
pageLimit = 20
bodyPageAllDocBil = {"query": {"bool": {"must": [], "should": []}}, "_source": ["annoruolo", "annosentenza", "cf_giudice","codiceoggetto", "controparte", "gradogiudizio", "nomegiudice", "parte","distretto"]}
bodyCountAllDoc = bodyPageAllDocBil
bodyCountAllDoc.pop('_source', None)
es = Elasticsearch(url)
res = es.count(index=index, body=bodyCountAllDoc)
sizeCount = res["count"]
bodyPageAllDocs = bodyPageAllDocBil
bodyPageAllDocs["size"] = pageLimit
es = Elasticsearch(url)
docs = es.search(index=index, body=bodyPageAllDocs,scroll = '10m')
currentSize = pageLimit
scrollId = docs["_scroll_id"]
page = 51711
paginationStart = (page - 1) * pageLimit
while currentSize <= paginationStart pageLimit:
es = Elasticsearch(url)
docs = es.scroll(scroll_id = scrollId,scroll = '10m')
countRec = len(docs["hits"]["hits"])
if currentSize == paginationStart:
print(docs["hits"]["hits"][0])
print(docs["hits"]["hits"][1])
#...
currentSize = currentSize countRec
scrollId = docs['_scroll_id']