I'll try to explain the problem as succinctly as possible. I'm trying to filter some values from a log file coming from Elastic. The log outputs this JSON exactly:
{
"took": 2,
"timed_out": False,
"_shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0},
"hits": {
"total": {"value": 2, "relation": "eq"},
"max_score": None,
"hits": [
{
"_index": "winlogbeat-dc-2022.10.17-000014",
"_type": "_doc",
"_id": "vOCnfoQBeS2JF7giMG9q",
"_score": None,
"_source": {
"agent": {"hostname": "SRVDC1"},
"@timestamp": "2022-11-16T04:19:13.622Z",
},
"sort": [-9223372036854775808],
},
{
"_index": "winlogbeat-dc-2022.10.17-000014",
"_type": "_doc",
"_id": "veCnfoQBeS2JF7giMG9q",
"_score": None,
"_source": {
"agent": {"hostname": "SRVDC1"},
"@timestamp": "2022-11-16T04:19:13.630Z",
},
"sort": [-9223372036854775808],
},
],
},
}
Now, I want to filter out only the _index and @timestamp keys. If I assign this JSON to a variable, I can perfectly filter out the two keys by running:
index = (data['hits']['hits'][0]['_index'])
timestamp = (data['hits']['hits'][0]['_source']['@timestamp'])
Output:
winlogbeat-dc*
2022-11-16T04:19:13.622Z
However, if I try to do the same directly from the server call, I get:
Traceback (most recent call last):
File "c:\Users\user\Desktop\PYTHON\tiny2.py", line 96, in <module>
query()
File "c:\Users\user\Desktop\PYTHON\tiny2.py", line 77, in query
index = (final_data['hits']['hits'][0]['_index'])
TypeError: string indices must be integers
Now, I understand the it's asking for integer values instead of the strings I'm using, but if I use integers, then I get individual characters rather than a key/value pair.
What am I missing?
UPDATE: Below is the entire code, but it won't help much. It contains Elastic's DSL query language, and a call to the server, which obviously you won't be able to connect to. I tried your suggestions, but I either get the same error, or a new one:
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not ObjectApiResponse
Entire code as follows:
import os
import ast
import csv
import json
from elasticsearch import Elasticsearch
from datetime import datetime,timedelta
import datetime
ELASTIC_USERNAME = 'elastic'
ELASTIC_PASSWORD = "abc123"
PORT= str('9200')
HOST = str('10.20.20.131')
CERT = os.path.join(os.path.dirname(__file__),"cert.crt")
initial_time = datetime.datetime.now()
past_time = datetime.datetime.now() - (timedelta(minutes=15))
def query():
try: #connection to Elastic server
es = Elasticsearch(
"https://10.20.20.131:9200",
ca_certs = CERT,
verify_certs=False,
basic_auth = (ELASTIC_USERNAME, ELASTIC_PASSWORD)
)
except ConnectionRefusedError as error:
print("[-] Connection error")
else: #DSL Elastic query of Domain Controler logs
query_res = es.search(
index="winlogbeat-dc*",
body={
"size": 3,
"sort": [
{
"timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"_source": [
"agent.hostname",
"@timestamp"
],
"query": {
"bool": {
"must": [],
"filter": [
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": f'{initial_time}',
"lte": f'{past_time}'
}
}
}
],
"should": [],
"must_not": []
}
}
}
)
if query_res:
parse_to_json =json.loads(query_res)
final_data = json.dumps(str(parse_to_json))
index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']
timestamp = ast.literal_eval(final_data)['hits']['hits'][0]['_source']['@timestamp']
columns = ['Index','Last Updated']
rows = [[f'{index}',f'{timestamp}']]
with open("final_data.csv", 'w') as csv_file:
write_to_csv = csv.writer(csv_file)
write_to_csv.writerow(columns)
write_to_csv.writerows(rows)
print("CSV file created!")
else:
print("Log not found")
query()
CodePudding user response:
If you're really getting '
in your response, use this:
import ast
...
index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']
Otherwise use this:
import json
...
index = json.loads(final_data)['hits']['hits'][0]['_index']
CodePudding user response:
Elasticsearch returns an ObjectApiResponse so you have to parse the _source
field:
import json
final_data = json.loads(query_res["_source"])
index = final_data['hits']['hits'][0]['_index']
I'm not sure why you surround with parenthesis the indexing selection.
CodePudding user response:
I struggle to make sense of this:
query_res = es.search(...)
if query_res:
parse_to_json =json.loads(query_res)
final_data = json.dumps(str(parse_to_json))
index = ast.literal_eval(final_data)['hits']['hits'][0]['_index']
timestamp = ast.literal_eval(final_data)['hits']['hits'][0]['_source']['@timestamp']
query_res
is an instance of ObjectApiResponse
, and you can get data from it like a dictionary right away. Instead you perform a sequence of converting object to string and back again, and then "stringify" it once more, with unpredictable results.
Just do it like they do in ES docs:
first_hit = query_res['hits']['hits'][0]
index = first_hit['_index']
timestamp = first_hit['_source']['@timestamp']