Home > Software engineering >  Regex using sed or Perl to comment XML Block
Regex using sed or Perl to comment XML Block

Time:12-02

I'm trying to comment a section at web.xml inside openAM using only sed or perl -pi (the only tools available to me).

So, Ill copy here:

  <servlet-mapping>
    <servlet-name>AgentConfigurationServlet</servlet-name>
    <url-pattern>/agentconfig/*</url-pattern>
  </servlet-mapping>
  <servlet-mapping>
    <servlet-name>VersionServlet</servlet-name>
    <url-pattern>/ccversion/*</url-pattern>
  </servlet-mapping>
  <servlet-mapping>
    <servlet-name>FSServlet</servlet-name>
    <url-pattern>/federation/*</url-pattern>
  </servlet-mapping>

but I'm trying to comment only the <servlet-mapping>...</servlet-mapping> (but only the one with the /ccversion. I tried everything and couldn't make it work.

What I tried:

sed -e "s/(<servlet-mapping>[\r\n] .*[\r\n] .*\/ccversion.*[\r\n] .*)/<\!-- \$1 -->/"

CodePudding user response:

You should use XML-aware tools to process XML. You mention Perl, there are several XML-handling modules available.

But, if you insist, try the following at your own risk:

perl -0777 -pe 's{.*\K(<servlet-mapping>\s*.*?<url-pattern>/ccversion/.*?</servlet-mapping>)}{<!-- $1 -->}s' file.xml
  • -0777 reads the file in the "slurp mode", i.e. it reads in the whole file, not processing it line by line;
  • The initial .*\K is there to match and forget everything before the <servlet-mapping> we're interested in;
  • The .*? needs the question mark ("frugal quantifier") to only match to the nearest <url-pattern>, and similarly the second one to match to the nearest </servlet-mapping>, instead of matching up to the last one;
  • The final }s modifier changes dots to also match newlines.

BTW, in xsh, a wrapper around XML::LibXML I happen to maintain, the same can be acieved with

open file.xml ;
for my $sm in //servlet-mapping[url-pattern="/ccversion/*"]
    xinsert comment {"$sm"} replace $sm ;
save :b ;    
  • Related