Home > front end >  How to use `sed` command to get a text between two patterns spanning across multiple lines?
How to use `sed` command to get a text between two patterns spanning across multiple lines?

Time:07-26

I have a file pom.xml with the following content:

...
    <artifactId>test-module</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <packaging>pom</packaging>

    <parent>
        <groupId>com.example</groupId>
        <artifactId>parent-module</artifactId>
        <version>1.1.1</version>
    </parent>
...

I'm interested in obtaining just the version of the test-module, namely, 1.0.0-SNAPSHOT. I've tried running this command but it doesn't seem to give me the desired result:

sed -e 's/.*<artifactId>test-module<\/artifactId>\s <version>\(.*\)<\/version>.*/\1/' -e 't' -e 'd' pom.xml

The motivation for trying the command above comes from the observation made from running this command here:

sed -e 's/.*<version>\(.*\)<\/version>.*/\1/' -e 't' -e 'd' pom.xml

which produces this output:

1.0.0-SNAPSHOT
1.1.1

Any help would be appreciated! Thank you!

CodePudding user response:

Assuming the file has the properties always in that order, you can grep for the test-module and print one line after that, then extract the version:

❯ cat stackoverflow.txt | grep -A1 '<artifactId>test-module</artifactId>' | sed -n 's,<version>\(.*\)</version>,\1,p'
    1.0.0-SNAPSHOT

CodePudding user response:

The branching is more trouble than it's worth. For the limited input example here:

sed -En '/ *<.?version>/{s///gp;q}' pom.xml

  • sed -En - -E to allow ? to mean 0-1 matches, -n to print only with p
  • / *<.?version>/ Match lines with open/close version tags optionally preceeded by whitespace.
  • s///gp - Delete all tag matches in that line, print. (// as match will repeat prior match)
  • q - Quit

Or just take your existing code and append | head -n1

Or: grep -Pom1 '(?<=version>)[^<] ' pom.xml

  • Related