Home > Mobile >  What is the correct regex pattern to use to clean up Google links in Vim?
What is the correct regex pattern to use to clean up Google links in Vim?

Time:12-31

As you know, Google links can be pretty unwieldy:

https://www.google.com/search?q=some search here&source=hp&newwindow=1&ei=A_23ssOllsUx&oq=some se....

I have MANY Google links saved that I would like to clean up to make them look like so:

https://www.google.com/search?q=some search here

The only issue is that I cannot figure out the correct regex pattern for Vim to do this.

I figure it must be something like this:

:%s/&source=[^&].*//

:%s/&source=[^&].*[^&]//

:%s/&source=.*[^&]//

But none of these are working; they start at &source, and replace until the end of the line.

Also, the search?q=some search here can appear anywhere after the .com/, so I cannot rely on it being in the same place every time.

So, what is the correct Vim regex pattern to use in order to clean up these links?

CodePudding user response:

Your example can easily be dealt with by using a very simple pattern:

:%s/&.*

because you want to keep everything that comes before the second parameter, which is marked by the first & in the string.

But, if the q parameter can be anywhere in the query string, as in:

https://www.google.com/search?source=hp&newwindow=1&q=some search here&ei=A_23ssOllsUx&oq=some se....

then no amount of capturing or whatnot will be enough to cover every possible case with a single pattern, let alone a readable one. At this point, scripting is really the only reasonable approach, preferably with a language that understands URLs.

--- EDIT ---

Hmm, scratch that. The following seems to work across the board:

:%s@^\(https://www.google.com/search?\)\(.*\)\(q=.\{-}\)&.*@\1\3
  • We use @ as separator because of the many / in a typical URL.
  • We capture a first group, up to and including the ? that marks the beginning of the query string.
  • We match whatever comes between the ? and the first occurrence of q= without capturing it.
  • We capture a second group, the q parameter, up to and excluding the next &.
  • We replace the whole thing with the first capture group followed by the second capture group.
  • Related