Home > database >  Extract data from curl output using sed, awk, cut or python3
Extract data from curl output using sed, awk, cut or python3

Time:09-17

I am trying to extract one url from a curl command's output in shell.

The curl command which I am running is :

curl -s "http://HOSTNAME.com/api/v4/projects/18/merge_requests?source_branch=samm_6819"

which gives output something like this :

[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]

Out of all this data, I need output to be :

http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69

I need this url which I need to print somewhere. I am not sure what can be used here either awk, sed , cut or anything with pipe with the curl command to get output of this url.

Can someone help me please? I never wanted to ask this here but I am running out of options on what could be the best way to achieve this.

Thanks

CodePudding user response:

Best option here is to use jq:

json='[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]'
echo $json | jq -r '.[].web_url'

CodePudding user response:

You can try this sed

sed 's/.*:.\(http.[^"]*\).*/\1/' 

It will match the last occurance of http through to the first occurance of " after http.

CodePudding user response:

Using cat file in place of curl... for demonstration, this will work on the input you show using any sed in any shell on every Unix box:

$ cat file | sed 's/.*"web_url":"\([^"]*\).*/\1/'
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69

CodePudding user response:

Try Perl

$ perl -ne ' /.*"web_url":"([^"] )"/ and print "$1\n" ' sameer.txt
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69

Another method:

perl -ne ' while( /"web_url":"([^"] )"/g ) { $x=1; $t=$1 } print "$t\n" if $x; $x=0 ' sameer.txt
  • Related