Home > OS >  Delete pattern from a line
Delete pattern from a line

Time:06-06

I have a file containing list of website and it's quoted in such order:

      "ProxyHost": "ie.review.visa.com",
      "ProxyHost": "ocasta.zendesk.com",
      "ProxyHost": "dev.zemanta.com",
      "ProxyHost": "bharian.api.useinsider.com",
      "ProxyHost": "optout.service.mycard.visa.com",
      "ProxyHost": "ir.newrelic.com",
      "ProxyHost": "metabase.yoast.com",
4:      "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18:      "ProxyHost": "pls.law.harvard.edu",
32:      "ProxyHost": "view.jquery.com",
46:      "ProxyHost": "www.rmf.harvard.edu",
60:      "ProxyHost": "execed.sph.harvard.edu",
74:      "ProxyHost": "note.microsoft.com",
102:      "ProxyHost": "librarylab.law.harvard.edu",
116:      "ProxyHost": "api.jquery.com",
130:      "ProxyHost": "pmsdn.

The target is: to Remove any in-front string until : " and if possible; also deletes ", at the end of line. It might require double execution but let's focus on the main problem. The expected result would look something like this:

librarylab.law.harvard.edu
or
librarylab.law.harvard.edu",

Any remaining left-over ", can be deleted easily using search-replace. Here's what i have tried:

sed "s/^*\:$//"
sed "s/^.*\:$//"
sed "/^.*#\://"
sed "s/^*\:$/d"
sed "s/^.*\:$/d"
sed -e "/^*/,s/\:/d"
and so-on...

All above give no changes into target file. I'm honestly confused; here's what i understand:

  • ^* or ^.* : Mark Any first string.
  • \:$ or #\: : Mark end string :
  • /d or // : to Delete

Any help would be cherished.

CodePudding user response:

If the last line with the unclosed ", is a typo, then you might use

sed -E 's~^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"] )",?$~\2~' file

In the replacement use capture group 2 denoted as \2 as capture group 1 is used for the optional part at the beginning.

The pattern matches:

  • ^ Start of string
  • ([0-9] :)? Optionally capture 1 digits and : in group 1
  • [[:space:]]* Match optional spaces
  • "[^"]*" Match from "....."
  • :[[:space:]]* Match : and optional spaces
  • "([^"] )" Match " then capture in group 2 all between the double quotes and match the ending double quote
  • ,? Match an optional comma
  • $ End of string

Output

ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn. 

CodePudding user response:

You can use

sed -E 's/^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]*).*/\2/' file > newfile

Details:

  • ^ - start of string
  • ([0-9] :)? - an optional sequence of one or more digits and then a : char
  • [[:space:]]* - zero or more whitespaces
  • " - a double quote
  • [^"]* - zero or more chars other than "
  • ": - a ": substring
  • [[:space:]]* - zero or more whitespaces
  • " - a " char
  • ([^"]*) - Group 2: any zero or more chars other than "
  • .* - the rest of the line.

See the online demo:

#!/bin/bash
s='      "ProxyHost": "ie.review.visa.com",
      "ProxyHost": "ocasta.zendesk.com",
      "ProxyHost": "dev.zemanta.com",
      "ProxyHost": "bharian.api.useinsider.com",
      "ProxyHost": "optout.service.mycard.visa.com",
      "ProxyHost": "ir.newrelic.com",
      "ProxyHost": "metabase.yoast.com",
4:      "ProxyHost": "designdiscoveryya.gsd.harvard.edu",
18:      "ProxyHost": "pls.law.harvard.edu",
32:      "ProxyHost": "view.jquery.com",
46:      "ProxyHost": "www.rmf.harvard.edu",
60:      "ProxyHost": "execed.sph.harvard.edu",
74:      "ProxyHost": "note.microsoft.com",
102:      "ProxyHost": "librarylab.law.harvard.edu",
116:      "ProxyHost": "api.jquery.com",
130:      "ProxyHost": "pmsdn.'
sed -E 's/^([0-9] :)?[[:space:]]*"[^"]*":[[:space:]]*"([^"]*).*/\2/' <<< "$s"

Output:

ie.review.visa.com
ocasta.zendesk.com
dev.zemanta.com
bharian.api.useinsider.com
optout.service.mycard.visa.com
ir.newrelic.com
metabase.yoast.com
designdiscoveryya.gsd.harvard.edu
pls.law.harvard.edu
view.jquery.com
www.rmf.harvard.edu
execed.sph.harvard.edu
note.microsoft.com
librarylab.law.harvard.edu
api.jquery.com
pmsdn.
  • Related