Home > other >  Regex: match only string C that is in between string A and string B
Regex: match only string C that is in between string A and string B

Time:11-04

How can I write a regex in a shell script that would target only the targeted substring between two given values? Give the example

https://www.stackoverflow.com

How can I match only the ":" in between "https" and "//". If possible please also explain the approach.

The context is that I need to prepare a file that would fetch a config from the server and append it to the .env file. The response comes as JSON

{
  "GRAPHQL_URL": "https://someurl/xyz",
  "PUBLIC_TOKEN": "skml2JdJyOcrVdfEJ3Bj1bs472wY8aSyprO2DsZbHIiBRqEIPBNg9S7yXBbYkndX2Lk8UuHoZ9JPdJEWaiqlIyGdwU6O5",
  "SUPER_SECRET": "MY_SUPER_SECRET"
}

so I need to adjust it to the .env syntax. What I managed to do this far is

#!/bin/bash
CURL_RESPONSE="$(curl -s url)"

cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s/[^a-zA-Z0-9=:_/-]//g' > .env.test

so basically I fetch the data, then extract the key I am after with jq, and then I use sed to first replace all ":" to "=" and after that I remove all the quotations and semicolons and white spaces that comes from JSON and leave some characters that are necessary.

I am almost there but the problem is that now my graphql url (and only other) would look like so

https=//someurl/xyz

so I need to replace this = that is in between https and // back with the colon.

Thank you very much @Nic3500 for the response, not sure why but I get error saying that

sed: 1: "s/:/=/g;s#https\(.*\)// ...": \1 not defined in the RE

I searched SO and it seems that it should work since the brackets are escaped and I use -r flag (tried -E but no difference) and I don't know how to apply it. To be honest I assume that the replacement block is this part

#\1#

so how can I let this know to what character should it be replaced?

This is how I tried to use it

#!/bin/bash
CURL_RESPONSE="$(curl -s url)"

cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s#https\(.*\)//.*#\1#;s/[^a-zA-Z0-9=:_/-]//g' > .env.test

Hope with this context you would be able to help me.

CodePudding user response:

echo "https://www.stackoverflow.com" | sed 's#https\(.*\)//.*#\1#'
:
  • sed operator s/regexp/replacement/
  • regexp: https\(.*)//.*. So "https" followed by something (.*), followed by "//", followed by anything else .*
  • the parenthesis are back slashed since they are not part of the pattern. They are used to group a part of the regex for the replacement part of the s### operator.
  • replacement: \1, means the first group found in the regex \(.*\)
  • I used s###, but the usual form is s///. Any character can take the place of the / with the s operator. I used # as using / would have been confusing since you use / in the url.

CodePudding user response:

The problem is that your sed substitutions are terribly imprecise. Anyway, you want to do it in jq instead, where you have more control over which parts you are substituting, and avoid spawning a separate process for something jq quite easily does natively in the first place.

curl -s url |
jq -r '.property.source | to_entries[] |
  "\(.key)=\"\(.value\)\""' > .env.test

Tangentially, capturing the output of curl into a variable just so you can immediately cat it once to standard output is just a waste of memory.

  • Related