Consider the following lines:
http://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3/
https://www.regex101.com/r/aC9tW2/3
https://www.regex101.com/r/aC9tW2/3/
In practice, all these URLs are the same. They're almost the same, but not quite, in theory.
How can I make Notepad remove all occurrences except the first if lines are very, very similar? My hope is to keep line 2 above and delete all other ones, but I'm okay with keeping line 1 only then changing HTTP to HTTPS later.
CodePudding user response:
You may try the following find and replace, in regex mode:
Find: (https?):\/\/(\S )(?:\s https?:\/\/\2)*
Replace: $1://$2
Demo
The strategy used here is to match:
(https?) match http or https and capture in $1
: :
// //
(\S ) match and capture remainder of URL in $2
(?:\s https?:\/\/\2)* then match the same URL zero or more subsequent times
We then replace with $1://$2
to replace all duplicates with the first occurrence of the URL.
CodePudding user response:
A short alternative answer that I had copied from stackoverflow's answer is still working ... check it out ..
Find:(\S )([^ ] \1)*
Replace with:$1