I am just simply trying to grab the commit ID, but not quite sure what I'm missing:
➜ ~ curl https://github.com/microsoft/vscode/releases -s | grep -oE 'microsoft/vscode/commit/(.*?)/hovercard'
microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard
The only thing I need back from this is ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
.
This works just fine on regex101.com and in ruby/python. What am I missing?
CodePudding user response:
If supported, you can use grep -oP
echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | grep -oP "microsoft/vscode/commit/\K.*?(?=/hovercard)"
Output
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
Another option is to use sed
with a capture group
echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | sed -E 's/microsoft\/vscode\/commit\/([^\/] )\/hovercard/\1/'
Output
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
CodePudding user response:
The point is that grep
does not support extracting capturing group submatches. If you install pcregrep
you could do that with
curl https://github.com/microsoft/vscode/releases -s | \
pcregrep -o1 'microsoft/vscode/commit/(.*?)/hovercard' | head -1
The | head -1
part is to fetch the first occurrence only.
I would suggest using awk
here:
awk 'match($0,/microsoft\/vscode\/commit\/[^\/]*\/hovercard/){print substr($0,RSTART 24,RLENGTH-34);exit}'
The regex will match a line containing
microsoft\/vscode\/commit\/
-microsoft/vscode/commit/
fixed string[^\/]*
- zero or more chars other than/
\/hovercard
- a/hovercard
string.
The substr($0,RSTART 24,RLENGTH-34)
will print the part of the line starting at the RSTART 24 (24 is the length of microsoft/vscode/commit/
) index and the RLENGTH is the length of microsoft/vscode/commit/
the length of the /hovercard
.
The exit
command will fetch you the first occurrence. Remove it if you need all occurrences.
CodePudding user response:
You can use sed
:
curl -s https://github.com/microsoft/vscode/releases |
sed -En 's=.*microsoft/vscode/commit/([^/] )/hovercard.*=\1=p' |
head -n 1
head -n 1
is to print the first match (there are 10)grep -o
will print (only) everything that matches, includingmicrosoft/
etc.- Your task can not be achieved with Mac's
grep
.grep -o
prints all matching text (compared to default behaviour of printing matching lines), includingmicrosoft/
etc. A grep which implemented perl regex (like GNU grep on Linux) could make use of look ahead/behind (grep -Po '(?<=microsoft/vscode/commit/)[^/] (?=/hovercard)'
). But it's just not available on Mac's grep.
CodePudding user response:
On MacOS you don't have gnu utilities available by default. You can just pipe your output to a simple awk
like this:
curl https://github.com/microsoft/vscode/releases -s |
grep -oE 'microsoft/vscode/commit/[^/] /hovercard' |
awk -F/ '{print $(NF-1)}'
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
3a6960b964327f0e3882ce18fcebd07ed191b316
f4af3cbf5a99787542e2a30fe1fd37cd644cc31f
b3318bc0524af3d74034b8bb8a64df0ccf35549a
6cba118ac49a1b88332f312a8f67186f7f3c1643
c13f1abb110fc756f9b3a6f16670df9cd9d4cf63
ee8c7def80afc00dd6e593ef12f37756d8f504ea
7f6ab5485bbc008386c4386d08766667e155244e
83bd43bc519d15e50c4272c6cf5c1479df196a4d
e7d7e9a9348e6a8cc8c03f877d39cb72e5dfb1ff