Home > Enterprise >  Find and replace a href value with PowerShell?
Find and replace a href value with PowerShell?

Time:11-04

I have a HTML file with a load of links in it. They are in the format http:/oldsite/showFile.asp?doc=1234&lib=lib1 I'd like to replace them with http://newsite/?lib=lib1&doc=1234

(1234 and lib1 are variable)

Any idea on how to do that?

Thanks P

CodePudding user response:

Read in the file, loop through each line and replace the old value with the new value, send the output to the a new file:

gc file.html | % { $_.Replace('oldsite...','newsite...') } | out-file new-file.html

CodePudding user response:

I don't think your examples are correct.

http:/oldsite/showFile.asp?doc=1234&lib=lib1 should be
http:/oldsite/showFile.asp?doc=1234&lib=lib1

and

http://newsite/?lib=lib1&doc=1234 should be http://newsite?lib=lib1&doc=1234

To do the replacement on these, you can do

'http:/oldsite/showFile.asp?doc=1234&lib=lib1' -replace 'http:/oldsite/showFile\.asp\?(doc=\d )&(lib=\w )', 'http://newsite?$2&$1'

which returns http://newsite?lib=lib1&doc=1234

To replace these in a file you can use:

(Get-Content -Path 'X:\TheHtmlFile.html' -Raw) -replace 'http:/oldsite/showFile\.asp\?(doc=\d )&(lib=\w )', 'http://newsite?$2&$1' |
 Set-Content -Path 'X:\TheNewHtmlFile.html'

Regex details:

http:/oldsite/showFile        Match the characters “http:/oldsite/showFile” literally
\.                            Match the character “.” literally
asp                           Match the characters “asp” literally
\?                            Match the character “?” literally
(                             Match the regular expression below and capture its match into backreference number 1
   doc=                       Match the characters “doc=” literally
   \d                         Match a single digit 0..9
                              Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                            
&                             Match the character&” literally
(                             Match the regular expression below and capture its match into backreference number 2
   lib=                       Match the characters “lib=” literally
   \w                         Match a single character that is a “word character” (letters, digits, etc.)
                              Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
  • Related