Home > OS >  How can I make this regex relative URL extraction work in grep?
How can I make this regex relative URL extraction work in grep?

Time:05-22

Have this string in a file and want to just extract the relative link:

<a href="/FreeCAD/FreeCAD-Bundle/releases/download/weekly-builds/FreeCAD_weekly-builds-28909-2022-05-20-conda-Linux-x86_64-py39.AppImage" rel="nofollow" data-skip-pjax>

This works in https://regexr.com/6m4vg :

/FreeCAD/[^]*AppImage

But returns nothing in grep.

grep -E '/FreeCAD/\[^]*AppImage' somefile

How can I make it work? Thanks.

Edit: source file:

wget https://github.com/FreeCAD/FreeCAD-Bundle/releases/tag/weekly-builds

Desired output:

/FreeCAD/FreeCAD-Bundle/releases/download/weekly-builds/FreeCAD_weekly-builds-28909-2022-05-20-conda-Linux-x86_64-py39.AppImage

CodePudding user response:

You need to use [^"]* instead of [^]*:

grep -o '/FreeCAD/[^"]*AppImage' somefile

/FreeCAD/[^]*AppImage works online because you test the pattern against the ECMAScript engine, but grep -E uses a POSIX ERE regex flavor, where the negated bracket expression should not be empty.

[^] in an ECMAScript regex flavor matches any char, so here, since grep works on a line by line basis, you can replace it with .*.

However, since the text you want to match cannot contain ", you can also use a more appropriate [^"]* pattern that matches zero or more chars other than a " char.

  • Related