I have a long list of json file . I want to extract the subdomain of harvard.edu which is in the variable host in "host": "ceonlineb2b.hms.harvard.edu using bash . I would be happy if anyone can help out .Below is only a snippet of json file.
{
"data": {
"total_items": 3,
"offset": 0,
"limit": 1,
"items": [
{
"name": "ceonlineb2b.hms.harvard.edu",
"alexa": null,
"cert_summary": null,
"dns_records": {
"A": [
"3.221.168.206",
"54.174.253.3"
],
"AAAA": null,
"CAA": null,
"CNAME": [
"hms-moodleb2b-prod.cabem.com"
],
"MX": null,
"NS": null,
"SOA": null,
"TXT": null,
"SPF": null,
"updated_at": "2021-05-14T23:12:43.332816923Z"
},
"hosts_enrichment": [
{
"ip": "3.221.168.206",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location": {
"lat": 39.0481,
"lon": -77.4728
}
},
{
"ip": "54.174.253.3",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location": {
"lat": 39.0481,
"lon": -77.4728
}
}
],
"http_extract": {
"cookies": [
{
"domain": "",
"expire": "0001-01-01T00:00:00Z",
"http_only": true,
"key": "MoodleSession",
"max_age": 0,
"path": "/",
"security": true,
"value": "tqhmqc4muk513sad1bmnl3kocj"
}
],
"description": "",
"emails": null,
"final_redirect_url": {
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
},
"extracted_at": "2020-10-04T20:55:26.043777194Z",
"favicon_sha256": "",
"http_headers": [
{
"name": "date",
"value": "Sun, 04 Oct 2020 20:55:25 GMT"
},
{
"name": "content-type",
"value": "text/html; charset=utf-8"
},
{
"name": "server",
"value": "Apache/2.4.46 () OpenSSL/1.0.2k-fips"
},
{
"name": "x-powered-by",
"value": "PHP/7.2.24"
},
{
"name": "content-language",
"value": "en"
},
{
"name": "content-script-type",
"value": "text/javascript"
},
{
"name": "content-style-type",
"value": "text/css"
},
{
"name": "x-ua-compatible",
"value": "IE=edge"
},
{
"name": "cache-control",
"value": "private, pre-check=0, post-check=0, max-age=0, no-transform"
},
{
"name": "pragma",
"value": "no-cache"
},
{
"name": "expires",
"value": ""
},
{
"name": "accept-ranges",
"value": "none"
},
{
"name": "set-cookie",
"value": "MoodleSession=tqhmqc4muk513sad1bmnl3kocj; path=/; secure;HttpOnly;Secure;SameSite=None"
}
],
"http_status_code": 200,
"links": [
{
"anchor": "Forgotten your username or password?",
"url": "https://ceonlineb2b.hms.harvard.edu/login/forgot_password.php",
"url_host": "ceonlineb2b.hms.harvard.edu"
},
{
"anchor": "Privacy Statement",
"url": "/local/staticpage/view.php?page=privacy-statement",
"url_host": ""
},
{
"anchor": "Terms of Service",
"url": "/local/staticpage/view.php?page=terms-of-service",
"url_host": ""
},
{
"anchor": "Copyright Information",
"url": "/local/staticpage/view.php?page=copyright-information",
"url_host": ""
}
],
"meta_tags": [
{
"name": "keywords",
"value": "moodle, HMS Postgraduate Courses: Log in to the site"
},
{
"name": "format-detection",
"value": "telephone=no"
},
{
"name": "robots",
"value": "noindex"
},
{
"name": "viewport",
"value": "width=device-width, initial-scale=1.0"
}
],
"robots_txt": "",
"scripts": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/javascript-static.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/requirejs/require.min.js",
"https://ceonlineb2b.hms.harvard.edu/theme/javascript.php/hms/1589465013/footer"
],
"styles": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.css",
"https://ceonlineb2b.hms.harvard.edu/theme/styles.php/hms/1589465013_1/all"
],
"title": "HMS Postgraduate Courses: Log in to the site"
},
"is_CNAME": null,
"is_MX": null,
"is_NS": null,
"is_PTR": null,
"is_subdomain": true,
"name_without_suffix": "ceonlineb2b.hms.harvard",
"updated_at": "2021-05-16T10:25:01.59086376Z",
"user_scan_at": null,
"whois_parsed": null,
"security_score": {
"score": 100
},
"cve_list": null,
"technologies": [
{
"name": "Moodle",
"version": ""
},
{
"name": "RequireJS",
"version": ""
}
],
"trackers": null,
"organizations": null
}
]
}
}
CodePudding user response:
For json parsing on bash, I recommend checking out jq. It's lightweight and versatile.
We can use the -r flag to output only values.
Output the fields of each object with the keys in sorted order.
--raw-output / -r:
The structure of the JSON you provided has the subdomain at .data.items[].http_extract.final_redirect_url.host
{
"data": {
"items": [
{
"http_extract": {
"final_redirect_url": {
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
},
...
I've saved your json to a file, se.json
Example extracting full domain with jq
jq -r '.data.items[].http_extract.final_redirect_url.host' se.json
Output
ceonlineb2b.hms.harvard.edu
To extract the subdomain, just perform a search/replace using sub().
sub(regex; tostring) sub(regex; string; flags)
Emit the string obtained by replacing the first match of regex in the input string with tostring, after interpolation. tostring should be a jq string, and may contain references to named captures. The named captures are, in effect, presented as a JSON object (as constructed by capture) to tostring, so a reference to a captured variable named "x" would take the form: "(.x)".
Extracting subdomain using jq
jq -r '.data.items[].http_extract.final_redirect_url.host | sub(".hms.harvard.edu";"")' se.json
Output
ceonlineb2b