I have a file containing list of website and it's quoted in such order:
"ProxyHost": "ie.review.visa.com",
"ProxyHost": "ocasta.zendesk.com",
"ProxyHost": "dev.zemanta.com",
"ProxyHost": "bharian.api.useinsider.com",
"ProxyHost": "optout.service.mycard.visa.com",
"ProxyHost": "ir.newrelic.com",
"ProxyHost": "metabase.yoast.com",
4: "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18: "ProxyHost": "pls.law.harvard.edu",
32: "ProxyHost": "view.jquery.com",
46: "ProxyHost": "www.rmf.harvard.edu",
60: "ProxyHost": "execed.sph.harvard.edu",
74: "ProxyHost": "note.microsoft.com",
102: "ProxyHost": "librarylab.law.harvard.edu",
116: "ProxyHost": "api.jquery.com",
130: "ProxyHost": "pmsdn.
The target is: to Remove any in-front
string until : "
and if possible; also deletes ",
at the end of line. It might require double execution but let's focus on the main problem. The expected result would look something like this:
librarylab.law.harvard.edu
or
librarylab.law.harvard.edu",
Any remaining left-over ",
can be deleted easily using search-replace. Here's what i have tried:
sed "s/^*\:$//"
sed "s/^.*\:$//"
sed "/^.*#\://"
sed "s/^*\:$/d"
sed "s/^.*\:$/d"
sed -e "/^*/,s/\:/d"
and so-on...
All above give no changes into target file. I'm honestly confused; here's what i understand:
^*
or^.*
: Mark Any first string.\:$
or#\:
: Mark end string:
/d
or//
: to Delete
Any help would be cherished.
CodePudding user response:
If the last line with the unclosed ",
is a typo, then you might use
sed -E 's~^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"] )",?$~\2~' file
In the replacement use capture group 2 denoted as \2
as capture group 1 is used for the optional part at the beginning.
The pattern matches:
^
Start of string([0-9] :)?
Optionally capture 1 digits and:
in group 1[[:space:]]*
Match optional spaces"[^"]*"
Match from"....."
:[[:space:]]*
Match:
and optional spaces"([^"] )"
Match"
then capture in group 2 all between the double quotes and match the ending double quote,?
Match an optional comma$
End of string
Output
ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn.
CodePudding user response:
You can use
sed -E 's/^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]*).*/\2/' file > newfile
Details:
^
- start of string([0-9] :)?
- an optional sequence of one or more digits and then a:
char[[:space:]]*
- zero or more whitespaces"
- a double quote[^"]*
- zero or more chars other than"
":
- a":
substring[[:space:]]*
- zero or more whitespaces"
- a"
char([^"]*)
- Group 2: any zero or more chars other than"
.*
- the rest of the line.
See the online demo:
#!/bin/bash
s=' "ProxyHost": "ie.review.visa.com",
"ProxyHost": "ocasta.zendesk.com",
"ProxyHost": "dev.zemanta.com",
"ProxyHost": "bharian.api.useinsider.com",
"ProxyHost": "optout.service.mycard.visa.com",
"ProxyHost": "ir.newrelic.com",
"ProxyHost": "metabase.yoast.com",
4: "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18: "ProxyHost": "pls.law.harvard.edu",
32: "ProxyHost": "view.jquery.com",
46: "ProxyHost": "www.rmf.harvard.edu",
60: "ProxyHost": "execed.sph.harvard.edu",
74: "ProxyHost": "note.microsoft.com",
102: "ProxyHost": "librarylab.law.harvard.edu",
116: "ProxyHost": "api.jquery.com",
130: "ProxyHost": "pmsdn.'
sed -E 's/^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]*).*/\2/' <<< "$s"
Output:
ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn.