Home > Back-end >  Find a list of unique IP address from a JSON log file in Python
Find a list of unique IP address from a JSON log file in Python

Time:07-11

How to write a program in python to find a list of the unique IP addresses from a JSON file?

I am only a newbie at Python and I have the following data in JSON format. I want to find the unique values of "remoteIp" key.

{
    "jsonPayload": {
      "enforcedSecurityPolicy": {
        "configuredAction": "DENY",
        "preconfiguredExprIds": [
          "owasp-crs-v030001-id942220-sqli"
        ],
        "priority": 2000,
        "outcome": "DENY"
      }
    },
    "httpRequest": {
      "requestMethod": "POST",
      "requestUrl": "https://wwwwwww.google.com///n",
      "requestSize": "3004",
      "status": 403,
      "responseSize": "274",
      "userAgent": "okhttp/3.12.2",
      "remoteIp": "182.2.169.59",
      "serverIp": "10.114.44.4"
    }
}

The solution I have created till now is able to fetch all the remoteIp's but is not unique.

import json
#unique_ip = {}
with open("automation.json") as file:
 data = json.load(file)
 for d1 in data:
  del d1['resource'], d1['timestamp'], d1['severity'], d1['logName'], d1['trace'], d1['spanId'], \
      d1['receiveTimestamp'], d1['jsonPayload']['statusDetails'], d1['jsonPayload']['@type'], d1['insertId'], \
      d1['jsonPayload']['enforcedSecurityPolicy']['name'], d1['httpRequest']['latency']

  # using d1['insertId'] above for uniquely identifying a record
  #print(d1['httpRequest']['remoteIp']) #d1['jsonPayload']['enforcedSecurityPolicy'])

with open('automation_new.json', 'w') as file:
   json.dump(data, file, indent=2)
for d2 in data:
    s1 = (d2['httpRequest']['requestUrl'])
    s2 = (d2['httpRequest']['requestMethod'])
    s3 = (d2['httpRequest']['remoteIp'])
    s4 = (str(d2['httpRequest']['status']))
    s5 = (d2['httpRequest']['userAgent'])
       #mylist = list((s1.split(), s2.split(), s3.split(), s5.split(), s4.split()))
    #mylist = list((s1, s2, s3, s4, s5))
    #def unique(s3):
        #x = np.array(s3)
        #print(np.unique(x))
    print(s3)
file.close()

CodePudding user response:

I didn't really understand what you want precisely to do, since remoteIp is a single field and not a list, so how can it have duplicate ip addresses?

Anyway I would suggest you to store the info retrieved from the json file inside a defaultdict which will keep its keys unique without you having to worry about it.

Once you're done you can iterate over your defaultdict and extract the unique keys you want :)

I encourage you to look for some defaultdict documentation and take a look at this question here.

p.s. when you open a file inside the with block you don't have to close it afterwards, the file is automatically closed once you exit the with block

CodePudding user response:

use a set()

a  = ['a','b','a']
b = set(a)
b
# print {'a', 'b'}

please print the type of s3

for d2 in data:
    #...
    s3 = (d2['httpRequest']['remoteIp'])
    #...
    print("length of unique ip set is "   str(len(unique_ip)))
    unique_ip.add(''.join(s3))

print(unique_ip)

CodePudding user response:

You could use a set to maintain unique values:

unique_remote_ips = set()
for d2 in data:
    s1 = (d2['httpRequest']['requestUrl'])
    s2 = (d2['httpRequest']['requestMethod'])
    unique_remote_ips.add(d2['httpRequest']['remoteIp'])
    s4 = (str(d2['httpRequest']['status']))
    s5 = (d2['httpRequest']['userAgent'])
  • Related