The goal is to, within a JSON file and using jq, assign portions of hrefFull
to hrefSimple
and hrefSubsite
. There may be better ways to achieve this, but I have approached this by looking for a solution that removes everything up until the string articles
in a key's value but preserves the string. As a result, multiple objects like the example objects below are contained in a single JSON file formatted with a [
at the start and and ]
at the end.
Desired results:
hrefFull
does not change. Strings extracted fromhrefFull
are applied tohrefSimple
andhrefSubsite
.hrefSimple
is everything after and includingarticles
. Ifarticles
is not in the string,hrefSimple
is the string after the final/
. See example object 7.hrefSubsite
is the string betweenhttps://docs.mysite.com/
and/articles...
.
Example results - object 1:
{
"hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
"hrefSimple": "articles/page-a.html",
"hrefSubsite": "product-a"
}
Example results - object 2:
{
"hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
"hrefSimple": "articles/guide-b/page-b.html",
"hrefSubsite": "product-b"
}
Example results - object 3:
{
"hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
"hrefSimple": "articles/guide-c/section-c/page-c.html",
"hrefSubsite": "product-c"
}
Example results - object 4:
{
"hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
"hrefSimple": "articles/page-d.html",
"hrefSubsite": "product-d/sub-product-d"
}
Example results - object 5:
{
"hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
"hrefSimple": "articles/guide-e/page-e.html",
"hrefSubsite": "product-e/sub-product-e"
}
Example results - object 6:
{
"hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
"hrefSimple": "articles/guide-f/section-f/page-f.html",
"hrefSubsite": "product-f/sub-product-f"
}
Example results - object 7:
{
"hrefFull": "https://docs.mysite.com/product-g/index.html",
"hrefSimple": "index.html",
"hrefSubsite": "product-g"
}
Failed attempt (in a Bash script):
siteUrl="docs.mysite.com"
jq '
(.hrefSimple = .hrefFull)
| .hrefSimple |= (gsub("https://\($siteUrl)/.*?/"; ""))
| (.hrefSubsite = .hrefFull)
| .hrefSubsite |= (gsub("https://\($siteUrl)/"; ""))
' file-1.json > file-2.json
The script produces both accurate and inaccurate results.
Accurate results:
- Object 1
- Object 2
- Object 3
- Object 7
Inaccurate results:
- Object 4:
hrefSimple
is incorrectlysub-product-d/articles/page-d.html
instead ofarticles/page-d.html
hrefSubsite
is incorrectlysub-product-d
instead ofproduct-d/sub-product-d
- Object 5:
hrefSimple
is incorrectlysub-product-e/articles/guide-e/page-e.html
instead ofarticles/guide-e/page-e.html
hrefSubsite
is incorrectlysub-product-e
instead ofproduct-e/sub-product-e
- Object 6:
hrefSimple
is incorrectlysub-product-f/articles/guide-f/section-f/page-f.html
instead ofarticles/guide-f/section-f/page-f.html
hrefSubsite
is incorrectlysub-product-f
instead ofproduct-f/sub-product-f
Other unsuccessful attempts (I can provide exact results if that's helpful):
- Various iterations of
articles
in forms of.hrefSimple |= (gsub("https://\($siteUrl)/.*?/"; ""))
and.hrefSubsite |= (gsub("https://\($siteUrl)/"; ""))
- Various iterations of
.hrefSimple |= split("articles")[0]
(also within.hrefSubsite
)
For context, if it matters, hrefFull
comes from an Azure App Insights export of page views for a documentation website. The exported data is used in an analytics report. I am creating hrefSimple
to join two tables and would like to filter on hrefSubsite
. The paths in hrefFull
are produced when generating a website using the DocFx static site generator and deploying to an Azure Blob.
CodePudding user response:
I'd use capture
with a regex:
. (.hrefFull | capture(
"^https://docs.mysite.com/(?<hrefSubsite>.*?)/(?<hrefSimple>articles.*|[^/]*)$"
))
{
"hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
"hrefSubsite": "product-a",
"hrefSimple": "articles/page-a.html"
}
{
"hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
"hrefSubsite": "product-b",
"hrefSimple": "articles/guide-b/page-b.html"
}
{
"hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
"hrefSubsite": "product-c",
"hrefSimple": "articles/guide-c/section-c/page-c.html"
}
{
"hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
"hrefSubsite": "product-d/sub-product-d",
"hrefSimple": "articles/page-d.html"
}
{
"hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
"hrefSubsite": "product-e/sub-product-e",
"hrefSimple": "articles/guide-e/page-e.html"
}
{
"hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
"hrefSubsite": "product-f/sub-product-f",
"hrefSimple": "articles/guide-f/section-f/page-f.html"
}
{
"hrefFull": "https://docs.mysite.com/product-g/index.html",
"hrefSubsite": "product-g",
"hrefSimple": "index.html"
}
If your input objects live in an array, wrap this filter into a map(…)
.