Home > Back-end >  How to grep only matching string from this result?
How to grep only matching string from this result?

Time:11-30

I am just simply trying to grab the commit ID, but not quite sure what I'm missing:

➜  ~ curl https://github.com/microsoft/vscode/releases -s | grep -oE 'microsoft/vscode/commit/(.*?)/hovercard'
microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard

The only thing I need back from this is ccbaa2d27e38e5afa3e5c21c1c7bef4657064247.

This works just fine on regex101.com and in ruby/python. What am I missing?

CodePudding user response:

If supported, you can use grep -oP

echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | grep -oP "microsoft/vscode/commit/\K.*?(?=/hovercard)"

Output

ccbaa2d27e38e5afa3e5c21c1c7bef4657064247

Another option is to use sed with a capture group

echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | sed -E 's/microsoft\/vscode\/commit\/([^\/] )\/hovercard/\1/'

Output

ccbaa2d27e38e5afa3e5c21c1c7bef4657064247

CodePudding user response:

The point is that grep does not support extracting capturing group submatches. If you install pcregrep you could do that with

curl https://github.com/microsoft/vscode/releases -s | \
  pcregrep -o1 'microsoft/vscode/commit/(.*?)/hovercard' | head -1

The | head -1 part is to fetch the first occurrence only.

I would suggest using awk here:

awk 'match($0,/microsoft\/vscode\/commit\/[^\/]*\/hovercard/){print substr($0,RSTART 24,RLENGTH-34);exit}'

The regex will match a line containing

  • microsoft\/vscode\/commit\/ - microsoft/vscode/commit/ fixed string
  • [^\/]* - zero or more chars other than /
  • \/hovercard - a /hovercard string.

The substr($0,RSTART 24,RLENGTH-34) will print the part of the line starting at the RSTART 24 (24 is the length of microsoft/vscode/commit/) index and the RLENGTH is the length of microsoft/vscode/commit/ the length of the /hovercard.

The exit command will fetch you the first occurrence. Remove it if you need all occurrences.

CodePudding user response:

You can use sed:

curl -s https://github.com/microsoft/vscode/releases |
sed -En 's=.*microsoft/vscode/commit/([^/] )/hovercard.*=\1=p' |
head -n 1
  • head -n 1 is to print the first match (there are 10)grep -o will print (only) everything that matches, including microsoft/ etc.
  • Your task can not be achieved with Mac's grep. grep -o prints all matching text (compared to default behaviour of printing matching lines), including microsoft/ etc. A grep which implemented perl regex (like GNU grep on Linux) could make use of look ahead/behind (grep -Po '(?<=microsoft/vscode/commit/)[^/] (?=/hovercard)'). But it's just not available on Mac's grep.

CodePudding user response:

On MacOS you don't have gnu utilities available by default. You can just pipe your output to a simple awk like this:

curl https://github.com/microsoft/vscode/releases -s |
grep -oE 'microsoft/vscode/commit/[^/] /hovercard' |
awk -F/ '{print $(NF-1)}'

ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
3a6960b964327f0e3882ce18fcebd07ed191b316
f4af3cbf5a99787542e2a30fe1fd37cd644cc31f
b3318bc0524af3d74034b8bb8a64df0ccf35549a
6cba118ac49a1b88332f312a8f67186f7f3c1643
c13f1abb110fc756f9b3a6f16670df9cd9d4cf63
ee8c7def80afc00dd6e593ef12f37756d8f504ea
7f6ab5485bbc008386c4386d08766667e155244e
83bd43bc519d15e50c4272c6cf5c1479df196a4d
e7d7e9a9348e6a8cc8c03f877d39cb72e5dfb1ff
  • Related