Extract data from curl output using sed, awk, cut or python3-CodePudding

I am trying to extract one url from a curl command's output in shell.

The curl command which I am running is :

curl -s "http://HOSTNAME.com/api/v4/projects/18/merge_requests?source_branch=samm_6819"

which gives output something like this :

[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]

Out of all this data, I need output to be :

http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69

I need this url which I need to print somewhere. I am not sure what can be used here either awk, sed , cut or anything with pipe with the curl command to get output of this url.

Can someone help me please? I never wanted to ask this here but I am running out of options on what could be the best way to achieve this.

Thanks

CodePudding user response：

Best option here is to use jq:

json='[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]'
echo $json | jq -r '.[].web_url'

CodePudding user response：

You can try this sed

sed 's/.*:.\(http.[^"]*\).*/\1/'

It will match the last occurance of http through to the first occurance of " after http.

CodePudding user response：

Using cat file in place of curl... for demonstration, this will work on the input you show using any sed in any shell on every Unix box:

$ cat file | sed 's/.*"web_url":"\([^"]*\).*/\1/'
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69

CodePudding user response：

Try Perl

$ perl -ne ' /.*"web_url":"([^"] )"/ and print "$1\n" ' sameer.txt
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69

Another method:

perl -ne ' while( /"web_url":"([^"] )"/g ) { $x=1; $t=$1 } print "$t\n" if $x; $x=0 ' sameer.txt