Home > Software design >  how get page start records from 23000 to 23004 in elasticsearch
how get page start records from 23000 to 23004 in elasticsearch

Time:04-19

I have an elasticsearch database contain about 100k rows. I want paginate about 30k rows.

The error that I get is about max-result-window.

In this case I cannot get records from 23000 to 23004 because is exceeds 10k records. Is there a workaround?

CodePudding user response:

A possible workaround that I find is to use scroll api. In pratice I scroll by size 20 (1 page), until I achieve page 51711. It takes about 10 minutes because It scroll all data before achieve start record 1070100 to record 1070120.

url = "http://localhost:9200"
index = "civile"
pageLimit = 20

bodyPageAllDocBil = {"query": {"bool": {"must": [], "should": []}}, "_source": ["annoruolo", "annosentenza",  "cf_giudice","codiceoggetto", "controparte", "gradogiudizio", "nomegiudice", "parte","distretto"]}

bodyCountAllDoc = bodyPageAllDocBil
bodyCountAllDoc.pop('_source', None)
es = Elasticsearch(url)
res = es.count(index=index, body=bodyCountAllDoc) 
sizeCount = res["count"]

bodyPageAllDocs = bodyPageAllDocBil
bodyPageAllDocs["size"] = pageLimit
es = Elasticsearch(url)

docs = es.search(index=index, body=bodyPageAllDocs,scroll = '10m')

currentSize = pageLimit
scrollId = docs["_scroll_id"]
page = 51711
paginationStart = (page - 1) * pageLimit

while currentSize <= paginationStart   pageLimit:
    es = Elasticsearch(url)
    docs = es.scroll(scroll_id = scrollId,scroll = '10m')
    countRec = len(docs["hits"]["hits"])
    
    if currentSize == paginationStart:
        print(docs["hits"]["hits"][0])
        print(docs["hits"]["hits"][1])
        #...

    currentSize = currentSize   countRec
    scrollId = docs['_scroll_id']  
  • Related