I have a large file that contains multiple entries that look like these below:
{"author":["frack113"],"description":"Detects a Sysmon configuration change, which could be the result of a legitimate reconfiguration or someone trying manipulate the configuration","ruleId":"8ac03a65-6c84-4116-acad-dc1558ff7a77","falsePositives":["Legitimate administrative action"],"from":"now-360s","immutable":false,"outputIndex":".siem-signals-default","meta":{"from":"1m"},"maxSignals":100,"riskScore":35,"riskScoreMapping":[],"severity":"medium","severityMapping":[],"threat":[{"tactic":{"id":"TA0005","reference":"https://attack.mitre.org/tactics/TA0005","name":"Defense Evasion"},"framework":"MITRE ATT&CK®","technique":[]}],"to":"now","references":["https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon"],"version":1,"exceptionsList":[],"index":["winlogbeat-*"],"query":"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")","language":"lucene","filters":[],"type":"query"},"schedule":{"interval":"5m"}}
And I am working on a python program to detect the string after the word "query", so for example in
"query":"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")"
I am trying to detect (winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")
and I have multiple of these to detect and then use it to compare against "query" in another file to find if there are any similarities.
I tried using this regex, but is not able to detect "query" at all.
(?<=^\"query\":\W)(\w.*)$
and
(?<='{\"query\"}':\s)'?([^'}},] )
Would appreciate if anyone can give some pointers as I am stuck on this for hours!
CodePudding user response:
You have the python tag in your question as well - so I am assuming a solution involving python script should be fine.
Given that you have a file data.txt with entries as the given example:
{"author":["frack113"],"description":"Detects a Sysmon configuration change, which could be the result of a legitimate reconfiguration or someone trying manipulate the configuration","ruleId":"8ac03a65-6c84-4116-acad-dc1558ff7a77","falsePositives":["Legitimate administrative action"],"from":"now-360s","immutable":false,"outputIndex":".siem-signals-default","meta":{"from":"1m"},"maxSignals":100,"riskScore":35,"riskScoreMapping":[],"severity":"medium","severityMapping":[],"threat":[{"tactic":{"id":"TA0005","reference":"https://attack.mitre.org/tactics/TA0005","name":"Defense Evasion"},"framework":"MITRE ATT&CK®","technique":[]}],"to":"now","references":["https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon"],"version":1,"exceptionsList":[],"index":["winlogbeat-*"],"query":"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\")","language":"lucene","filters":[],"type":"query"},"schedule":{"interval":"5m"}}
Then, you can run the following script to print the required strings.
def main():
with open('data.txt') as f:
for line in f:
line = line.split("query")
result = line[1]
result = result.split(")")
result = result[0][2:]
print(result)
main()
For the example string you have provided, this script prints:
"(winlog.channel:\"Microsoft\\-Windows\\-Sysmon\\/Operational\" AND winlog.event_id:\"16\"
Hope it helps!
CodePudding user response:
2 ways to do it:
- Read in as json, then iterate through the dictionary. 2) Read in as str and regex it.
1. Read in as json:
import json
file = 'exportedSignal.ndjson'
with open(file, 'r', encoding = 'cp850') as f:
jsonData = json.load(f)
queries = []
hits = jsonData['hits']['hits']
for hit in hits:
if 'query' in hit['_source']['alert']['params'].keys():
query = hit['_source']['alert']['params']['query']
queries.append(query)
print(queries)
2. Use Regex:
import re
file = 'exportedSignal.ndjson'
with open(file, 'r', encoding = 'cp850') as f:
data = f.read()
queries = re.findall('\"query\":\"(.*?)\"', data)
print(queries)
Output:
Both produce a list of 2006 values from the "query"
key.