Delete pattern from a line-CodePudding

I have a file containing list of website and it's quoted in such order:

      "ProxyHost": "ie.review.visa.com",
      "ProxyHost": "ocasta.zendesk.com",
      "ProxyHost": "dev.zemanta.com",
      "ProxyHost": "bharian.api.useinsider.com",
      "ProxyHost": "optout.service.mycard.visa.com",
      "ProxyHost": "ir.newrelic.com",
      "ProxyHost": "metabase.yoast.com",
4:      "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18:      "ProxyHost": "pls.law.harvard.edu",
32:      "ProxyHost": "view.jquery.com",
46:      "ProxyHost": "www.rmf.harvard.edu",
60:      "ProxyHost": "execed.sph.harvard.edu",
74:      "ProxyHost": "note.microsoft.com",
102:      "ProxyHost": "librarylab.law.harvard.edu",
116:      "ProxyHost": "api.jquery.com",
130:      "ProxyHost": "pmsdn.

The target is: to Remove any in-front string until : " and if possible; also deletes ", at the end of line. It might require double execution but let's focus on the main problem. The expected result would look something like this:

librarylab.law.harvard.edu
or
librarylab.law.harvard.edu",

Any remaining left-over ", can be deleted easily using search-replace. Here's what i have tried:

sed "s/^*\:$//"
sed "s/^.*\:$//"
sed "/^.*#\://"
sed "s/^*\:$/d"
sed "s/^.*\:$/d"
sed -e "/^*/,s/\:/d"
and so-on...

All above give no changes into target file. I'm honestly confused; here's what i understand:

^* or ^.* : Mark Any first string.
\:$ or #\: : Mark end string :
/d or // : to Delete

Any help would be cherished.

CodePudding user response：

If the last line with the unclosed ", is a typo, then you might use

sed -E 's~^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"] )",?$~\2~' file

In the replacement use capture group 2 denoted as \2 as capture group 1 is used for the optional part at the beginning.

The pattern matches:

^ Start of string
([0-9] :)? Optionally capture 1 digits and : in group 1
[[:space:]]* Match optional spaces
"[^"]*" Match from "....."
:[[:space:]]* Match : and optional spaces
"([^"] )" Match " then capture in group 2 all between the double quotes and match the ending double quote
,? Match an optional comma
$ End of string

Output

ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn.

CodePudding user response：

You can use

sed -E 's/^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]*).*/\2/' file > newfile

Details:

^ - start of string
([0-9] :)? - an optional sequence of one or more digits and then a : char
[[:space:]]* - zero or more whitespaces
" - a double quote
[^"]* - zero or more chars other than "
": - a ": substring
[[:space:]]* - zero or more whitespaces
" - a " char
([^"]*) - Group 2: any zero or more chars other than "
.* - the rest of the line.

See the online demo:

#!/bin/bash
s='      "ProxyHost": "ie.review.visa.com",
      "ProxyHost": "ocasta.zendesk.com",
      "ProxyHost": "dev.zemanta.com",
      "ProxyHost": "bharian.api.useinsider.com",
      "ProxyHost": "optout.service.mycard.visa.com",
      "ProxyHost": "ir.newrelic.com",
      "ProxyHost": "metabase.yoast.com",
4:      "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18:      "ProxyHost": "pls.law.harvard.edu",
32:      "ProxyHost": "view.jquery.com",
46:      "ProxyHost": "www.rmf.harvard.edu",
60:      "ProxyHost": "execed.sph.harvard.edu",
74:      "ProxyHost": "note.microsoft.com",
102:      "ProxyHost": "librarylab.law.harvard.edu",
116:      "ProxyHost": "api.jquery.com",
130:      "ProxyHost": "pmsdn.'
sed -E 's/^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]*).*/\2/' <<< "$s"

Output:

ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn.