Home > Back-end >  Google Sheets: Removing everything from a URL up to '.com', and keeping everything after t
Google Sheets: Removing everything from a URL up to '.com', and keeping everything after t

Time:07-12

I've been searching for a formula on Google Sheets that removes everything from a url, up to the first /. For example:

www.example.com/example/
www.example.com/example/example1
www.example.com/example/example1/example2

to:

/example/
/example/example1/
/example/example1/example2

Any help would be greatly appreciated!

CodePudding user response:

Say your URL is in A1. Then

=regexreplace(A1,"[\w.-]*\.com/","")

works as long as the site portion of your URLs always end with .com.

When that is not true, it's a bit more cumbersome:

=substitute(A20,regexextract(A20,"[\w-.]*/"),"")

Quickly on why they work: [\w.-] means any character from A-Z, a-z, 0-9, _, ., -. * means any number of the preceding. regexreplace() replaces all matching patterns. And that's why without specifically .com, regexreplace() doesn't work for your problem. regexextract() extracts the part of URL that you don't want. So we can use substitute() to get rid of it.


If you are new to regular expression, I think this example is simple enough to get a little bit into it. For more complex scenarios, the use of regular expression will save us tons of work.

In your current problem, since you only need to recognize the first occurrence of / without more complex patterns, we can just search for that with find() and thus

=right(A1,len(A1)-find("/",A1))

also works.

CodePudding user response:

Try this

on column A is your input
paste this in column B =IF(A2="",,RIGHT(REGEXEXTRACT(A2,".com\/. "),LEN(REGEXEXTRACT(A2,".com\/. "))-4))
you can drag it down without any problem

Hope it answered your question.

  • Related