Home > Software engineering >  How can I prevent sed from inserting blanks?
How can I prevent sed from inserting blanks?

Time:09-22

I wrote this code to extract the software version from one file and overwrite it in another:

newVersion=$(sed -r -n 's/<version>(.*-SNAPSHOT)<\/version>/\1/p' sa-pom.xml)
find ./pom.xml -type f -exec sed -r -i -e "s/<version>(.*-SNAPSHOT)<\/version>/<version>${newVersion}<\/version>/g" {} \;
echo '<version>'$newVersion'</version>'

It works, but it puts one space in the support variable and three spaces in the target file generating the following outputs, respectively:

<version> 0.19.6-SNAPSHOT</version>
<version>   0.19.6-SNAPSHOT</version>

This is a cut version of the sa-pom.xml file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>my-group-id</groupId>
    <artifactId>my-artifact-id</artifactId>
      <version>0.19.9-SNAPSHOT</version>

    <packaging>jar</packaging>

    <name>my-project-name</name>

</project>

This is a cut version of the pom.xml file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>my-group-id</groupId>
        <artifactId>my-parent-artifact-id</artifactId>
        <version>${revision}</version>
    </parent>

    <artifactId>my-artifact-id</artifactId>
      <version>0.19.8-SNAPSHOT</version>

    <packaging>jar</packaging>

    <name>my-project-name</name>

</project>

How can this be solved?

CodePudding user response:

sed is not really adding any spaces here, but it captures any which were already present, and keeps them in the replacement. But so, just extend your regex to avoid capturing any spaces.

newVersion=$(sed -n -r 's%.*<version>[[:space:]]*(.*-SNAPSHOT)</version>.*%\1%p' sa-pom.xml)
sed -r -i "s%<version>[[:space:]]*(.*-SNAPSHOT)</version>%<version>${newVersion}</version>%" pom.xml
echo "<version>$newVersion</version>"

Adding .* before the <version> removes the leading spaces from the line (and any other text before the <version> tag). I also added .* after </version> to trim off any text after the closing tag, just to keep this robust.

Adding [[:space:]]* before the capture ensures that no spaces will be included in the captured expression, because the regex engine will have skipped as many as possible, and will never need to backtrack from there to get a match (if it backtracks, it's because it can't find a match at all). If that sounds too complicated, let's just state more broadly that the regex engine prefers the longest-leftmost match, and so matching the spaces outside the capturing group keeps them out of it.

The find also seemed entirely superfluous here. You'll notice that I also switched the s%..%..% delimiter so as to avoid having to backslash-escape the slashes. The /g flag would appear unnecessary, too (unless you really expect multiple matches per line, but then you could not use .* in the search because it will eat all the text between the first match and the last). And the -e isn't really necessary if your script only consists of a single string (and doesn't start with a dash). Finally, I fixed the quoting in the echo.

This is still quite brittle; ideally, use an XML-aware tool to parse values out of XML files.

  • Related