Home > database >  txtai ElasticSearch Similarity slow
txtai ElasticSearch Similarity slow

Time:08-06

I've been trying to run txtai in hopes of getting semantic search working in ElasticSearch. My main goal is to be able to use this to query against tickets in a help desk system and return tickets that are similar to my query.

Example Query: What operating system should I use?

This would return a list of results (similar to what stackoverflow does when typing in the title of my question).

In using txtai, I've noticed that it is abysmally slow. Requesting for one result and my response time is almost 10 seconds vs the "instantaneous" speed of ElasticSearch returning 50 results. Perhaps there is something I am missing on how this should perform.

I'll share the test code I'm currently working with:

from txtai.pipeline import Similarity
from elasticsearch import Elasticsearch, helpers

# Connect to ES instance
es = Elasticsearch(hosts=["http://localhost:9200"], timeout=60, retry_on_timeout=True)

def ranksearch(query, limit):
  results = [text for _, text in search(query, limit * 10)]
  return [(score, results[x]) for x, score in similarity(query, results)][:limit]

def search(query, limit):
  query = {
      "size": limit,
      "query": {
          "query_string": {"query": query}
      }
  }

  results = []
  for result in es.search(index="articles", body=query)["hits"]["hits"]:
    source = result["_source"]
    results.append((min(result["_score"], 18) / 18, source["title"]))
  return results

similarity = Similarity("valhalla/distilbart-mnli-12-3")

limit = 1
query = "Bad News"
print(ranksearch(query, limit))

Any help is appreciated.

CodePudding user response:

Issue also filed over on GitHub and is being discussed there: https://github.com/neuml/txtai/issues/319

  • Related