Shell Script Retrieve Data from html tag-CodePudding

I want to get value inside </em>4,519</a> tag via shell script anyone please help how can do that?

id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>

CodePudding user response：

Using grep that supports/has the -P flag.

grep -Po '(?<=</em>).*(?=</a>)' file

echo 'id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>' | grep -Po '(?<=</em>).*(?=</a>)'

As what was suggested in the comments, don't parse html/xml with such tools. Use a tool/utility for parsing such files.

CodePudding user response：

Just use grep with the -o switch in order only to show that information:

grep -o "</em>.*</a>" test.txt

.* stands for any number of any character.

CodePudding user response：

If your HTML string containing only one substring like that, you can use regexp and sed:

echo "id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>" | sed -rn 's@^.*</em>(.*)</a>.*$@\1@p'

Output:

4,519

If you have something more complicated, you may want to check parsing XML in bash. E.g., here.

Hope that helps.