Home > Software design >  Assign strings from a key value in a JSON file to other keys in the file using jq
Assign strings from a key value in a JSON file to other keys in the file using jq

Time:12-08

The goal is to, within a JSON file and using jq, assign portions of hrefFull to hrefSimple and hrefSubsite. There may be better ways to achieve this, but I have approached this by looking for a solution that removes everything up until the string articles in a key's value but preserves the string. As a result, multiple objects like the example objects below are contained in a single JSON file formatted with a [ at the start and and ] at the end.

Desired results:

  • hrefFull does not change. Strings extracted from hrefFull are applied to hrefSimple and hrefSubsite.
  • hrefSimple is everything after and including articles. If articles is not in the string, hrefSimple is the string after the final /. See example object 7.
  • hrefSubsite is the string between https://docs.mysite.com/ and /articles....

Example results - object 1:

{
  "hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
  "hrefSimple": "articles/page-a.html",
  "hrefSubsite": "product-a"
}

Example results - object 2:

{
  "hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
  "hrefSimple": "articles/guide-b/page-b.html",
  "hrefSubsite": "product-b"
}

Example results - object 3:

{
  "hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
  "hrefSimple": "articles/guide-c/section-c/page-c.html",
  "hrefSubsite": "product-c"
}

Example results - object 4:

{
  "hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
  "hrefSimple": "articles/page-d.html",
  "hrefSubsite": "product-d/sub-product-d"
}

Example results - object 5:

{
  "hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
  "hrefSimple": "articles/guide-e/page-e.html",
  "hrefSubsite": "product-e/sub-product-e"
}

Example results - object 6:

{
  "hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
  "hrefSimple": "articles/guide-f/section-f/page-f.html",
  "hrefSubsite": "product-f/sub-product-f"
}

Example results - object 7:

{
  "hrefFull": "https://docs.mysite.com/product-g/index.html",
  "hrefSimple": "index.html",
  "hrefSubsite": "product-g"
}

Failed attempt (in a Bash script):

siteUrl="docs.mysite.com"
jq '
(.hrefSimple = .hrefFull)
| .hrefSimple |= (gsub("https://\($siteUrl)/.*?/"; ""))
| (.hrefSubsite = .hrefFull)
| .hrefSubsite |= (gsub("https://\($siteUrl)/"; ""))
' file-1.json > file-2.json

The script produces both accurate and inaccurate results.

Accurate results:

  • Object 1
  • Object 2
  • Object 3
  • Object 7

Inaccurate results:

  • Object 4:
    • hrefSimple is incorrectly sub-product-d/articles/page-d.html instead of articles/page-d.html
    • hrefSubsite is incorrectly sub-product-d instead of product-d/sub-product-d
  • Object 5:
    • hrefSimple is incorrectly sub-product-e/articles/guide-e/page-e.html instead of articles/guide-e/page-e.html
    • hrefSubsite is incorrectly sub-product-e instead of product-e/sub-product-e
  • Object 6:
    • hrefSimple is incorrectly sub-product-f/articles/guide-f/section-f/page-f.html instead of articles/guide-f/section-f/page-f.html
    • hrefSubsite is incorrectly sub-product-f instead of product-f/sub-product-f

Other unsuccessful attempts (I can provide exact results if that's helpful):

  • Various iterations of articles in forms of .hrefSimple |= (gsub("https://\($siteUrl)/.*?/"; "")) and .hrefSubsite |= (gsub("https://\($siteUrl)/"; ""))
  • Various iterations of .hrefSimple |= split("articles")[0] (also within .hrefSubsite)

For context, if it matters, hrefFull comes from an Azure App Insights export of page views for a documentation website. The exported data is used in an analytics report. I am creating hrefSimple to join two tables and would like to filter on hrefSubsite. The paths in hrefFull are produced when generating a website using the DocFx static site generator and deploying to an Azure Blob.

CodePudding user response:

I'd use capture with a regex:

.   (.hrefFull | capture(
  "^https://docs.mysite.com/(?<hrefSubsite>.*?)/(?<hrefSimple>articles.*|[^/]*)$"
))
{
  "hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
  "hrefSubsite": "product-a",
  "hrefSimple": "articles/page-a.html"
}
{
  "hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
  "hrefSubsite": "product-b",
  "hrefSimple": "articles/guide-b/page-b.html"
}
{
  "hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
  "hrefSubsite": "product-c",
  "hrefSimple": "articles/guide-c/section-c/page-c.html"
}
{
  "hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
  "hrefSubsite": "product-d/sub-product-d",
  "hrefSimple": "articles/page-d.html"
}
{
  "hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
  "hrefSubsite": "product-e/sub-product-e",
  "hrefSimple": "articles/guide-e/page-e.html"
}
{
  "hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
  "hrefSubsite": "product-f/sub-product-f",
  "hrefSimple": "articles/guide-f/section-f/page-f.html"
}
{
  "hrefFull": "https://docs.mysite.com/product-g/index.html",
  "hrefSubsite": "product-g",
  "hrefSimple": "index.html"
}

Demo

If your input objects live in an array, wrap this filter into a map(…).

  • Related