I have a HTML file with a load of links in it.
They are in the format
http:/oldsite/showFile.asp?doc=1234&lib=lib1
I'd like to replace them with
http://newsite/?lib=lib1&doc=1234
(1234 and lib1 are variable)
Any idea on how to do that?
Thanks P
CodePudding user response:
Read in the file, loop through each line and replace the old value with the new value, send the output to the a new file:
gc file.html | % { $_.Replace('oldsite...','newsite...') } | out-file new-file.html
CodePudding user response:
I don't think your examples are correct.
http:/oldsite/showFile.asp?doc=1234&lib=lib1
should be
http:/oldsite/showFile.asp?doc=1234&lib=lib1
and
http://newsite/?lib=lib1&doc=1234
should be http://newsite?lib=lib1&doc=1234
To do the replacement on these, you can do
'http:/oldsite/showFile.asp?doc=1234&lib=lib1' -replace 'http:/oldsite/showFile\.asp\?(doc=\d )&(lib=\w )', 'http://newsite?$2&$1'
which returns http://newsite?lib=lib1&doc=1234
To replace these in a file you can use:
(Get-Content -Path 'X:\TheHtmlFile.html' -Raw) -replace 'http:/oldsite/showFile\.asp\?(doc=\d )&(lib=\w )', 'http://newsite?$2&$1' |
Set-Content -Path 'X:\TheNewHtmlFile.html'
Regex details:
http:/oldsite/showFile Match the characters “http:/oldsite/showFile” literally
\. Match the character “.” literally
asp Match the characters “asp” literally
\? Match the character “?” literally
( Match the regular expression below and capture its match into backreference number 1
doc= Match the characters “doc=” literally
\d Match a single digit 0..9
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
& Match the character “&” literally
( Match the regular expression below and capture its match into backreference number 2
lib= Match the characters “lib=” literally
\w Match a single character that is a “word character” (letters, digits, etc.)
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)