Home > other >  Manipulating ElasticSearch query result: "string indices must be integers"
Manipulating ElasticSearch query result: "string indices must be integers"

Time:12-25

I'll try to explain the problem as succinctly as possible. I'm trying to filter some values from a log file coming from Elastic. The log outputs this JSON exactly:

{
    "took": 2,
    "timed_out": False,
    "_shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0},
    "hits": {
        "total": {"value": 2, "relation": "eq"},
        "max_score": None,
        "hits": [
            {
                "_index": "winlogbeat-dc-2022.10.17-000014",
                "_type": "_doc",
                "_id": "vOCnfoQBeS2JF7giMG9q",
                "_score": None,
                "_source": {
                    "agent": {"hostname": "SRVDC1"},
                    "@timestamp": "2022-11-16T04:19:13.622Z",
                },
                "sort": [-9223372036854775808],
            },
            {
                "_index": "winlogbeat-dc-2022.10.17-000014",
                "_type": "_doc",
                "_id": "veCnfoQBeS2JF7giMG9q",
                "_score": None,
                "_source": {
                    "agent": {"hostname": "SRVDC1"},
                    "@timestamp": "2022-11-16T04:19:13.630Z",
                },
                "sort": [-9223372036854775808],
            },
        ],
    },
}

Now, I want to filter out only the _index and @timestamp keys. If I assign this JSON to a variable, I can perfectly filter out the two keys by running:

index = (data['hits']['hits'][0]['_index'])
timestamp = (data['hits']['hits'][0]['_source']['@timestamp'])

Output:

winlogbeat-dc*
2022-11-16T04:19:13.622Z

However, if I try to do the same directly from the server call, I get:

Traceback (most recent call last):
  File "c:\Users\user\Desktop\PYTHON\tiny2.py", line 96, in <module>
    query()
  File "c:\Users\user\Desktop\PYTHON\tiny2.py", line 77, in query
    index = (final_data['hits']['hits'][0]['_index'])
TypeError: string indices must be integers

Now, I understand the it's asking for integer values instead of the strings I'm using, but if I use integers, then I get individual characters rather than a key/value pair.

What am I missing?

UPDATE: Below is the entire code, but it won't help much. It contains Elastic's DSL query language, and a call to the server, which obviously you won't be able to connect to. I tried your suggestions, but I either get the same error, or a new one:

    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not ObjectApiResponse

Entire code as follows:

import os
import ast
import csv
import json
from elasticsearch import Elasticsearch
from datetime import datetime,timedelta
import datetime

ELASTIC_USERNAME = 'elastic'
ELASTIC_PASSWORD = "abc123"
PORT= str('9200')
HOST = str('10.20.20.131')
CERT = os.path.join(os.path.dirname(__file__),"cert.crt")

initial_time = datetime.datetime.now()
past_time = datetime.datetime.now() - (timedelta(minutes=15))

def query():
    try: #connection to Elastic server
        es = Elasticsearch(
            "https://10.20.20.131:9200",
            ca_certs = CERT,
            verify_certs=False,
            basic_auth = (ELASTIC_USERNAME, ELASTIC_PASSWORD)
        )
    except ConnectionRefusedError as error:
        print("[-] Connection error")
    else: #DSL Elastic query of Domain Controler logs
        query_res = es.search(
            index="winlogbeat-dc*",
            body={
                "size": 3,
                "sort": [
                    {
                        "timestamp": {
                            "order": "desc",
                            "unmapped_type": "boolean"
                        }
                    }
                ],
                "_source": [
                    "agent.hostname",
                    "@timestamp"
                ],
                "query": {
                    "bool": {
                    "must": [],
                    "filter": [
                        {
                        "range": {
                            "@timestamp": {
                            "format": "strict_date_optional_time",
                            "gte": f'{initial_time}',
                            "lte": f'{past_time}'
                            }
                        }
                        }
                    ],
                    "should": [],
                    "must_not": []
                    }
                }
                }
            )
    
    if query_res:
        parse_to_json =json.loads(query_res)
        final_data = json.dumps(str(parse_to_json))
   
        index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']
        timestamp = ast.literal_eval(final_data)['hits']['hits'][0]['_source']['@timestamp']

        columns = ['Index','Last Updated']
        rows = [[f'{index}',f'{timestamp}']]

        with open("final_data.csv", 'w') as csv_file:
            write_to_csv = csv.writer(csv_file)
            write_to_csv.writerow(columns)
            write_to_csv.writerows(rows)
            print("CSV file created!")

    else:
        print("Log not found")
query()

CodePudding user response:

If you're really getting ' in your response, use this:

import ast
...
index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']

Otherwise use this:

import json
...
index = json.loads(final_data)['hits']['hits'][0]['_index']

CodePudding user response:

Elasticsearch returns an ObjectApiResponse so you have to parse the _source field:

import json
final_data = json.loads(query_res["_source"])
index = final_data['hits']['hits'][0]['_index']

I'm not sure why you surround with parenthesis the indexing selection.

CodePudding user response:

I struggle to make sense of this:

    query_res = es.search(...)
    if query_res:
        parse_to_json =json.loads(query_res)
        final_data = json.dumps(str(parse_to_json))
   
        index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']
        timestamp = ast.literal_eval(final_data)['hits']['hits'][0]['_source']['@timestamp']

query_res is an instance of ObjectApiResponse, and you can get data from it like a dictionary right away. Instead you perform a sequence of converting object to string and back again, and then "stringify" it once more, with unpredictable results.

Just do it like they do in ES docs:

first_hit = query_res['hits']['hits'][0]
index = first_hit['_index']
timestamp = first_hit['_source']['@timestamp']
  • Related