Home > Mobile >  How to filter all the paths from the urls using "sed" or "grep"
How to filter all the paths from the urls using "sed" or "grep"

Time:07-22

I was trying to filter all the files from the URLs and get only paths.

echo -e "http://sub.domain.tld/secured/database_connect.php\nhttp://sub.domain.tld/section/files/image.jpg\nhttp://sub.domain.tld/.git/audio-files/top-secret/audio.mp3" | grep -Ei "(http|https)://[^/\"] " | sort -u

http://sub.domain.tld

But I want the result like this

http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/

Is there any way to do it with sed or grep

CodePudding user response:

Using grep

$ echo ... | grep -o '.*/'
http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/

CodePudding user response:

with grep

If your grep has the -o option:

... | grep -Eio 'https?://.*/'

If there could be multiple URLs per line:

... | grep -Eio 'https?://[^[:space:]] /'

with sed

If the input is always precisely one URL per line and nothing else, you can just delete the filename part:

... | sed 's/[^/]*$//'

CodePudding user response:

GNU Awk

$ echo ... | awk 'match($0,/.*\//,a){print a[0]}'
$ echo ... | awk '{print gensub(/(.*\/).*/,"\\1",1)}'
$ echo ... | awk 'sub(/[^/]*$/,"")'

http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/

xargs

$ echo ... | xargs -i sh -c 'echo $(dirname "{}")/'

http://sub.domain.tld/secured/
http://sub.domain.tld/section/files/
http://sub.domain.tld/.git/audio-files/top-secret/

CodePudding user response:

You could use match function of awk, will work in any version of awk. Simple explanation would be, passing echo command's output to awk program. Using match matching everything till last occurrence of / and then printing the sub-string to print just before /(with -1 to RLENGTH).

your_echo_command | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}'
  • Related