Home > Software design >  Notepad : How to keep the first occurrence of almost identical lines?
Notepad : How to keep the first occurrence of almost identical lines?

Time:03-09

Consider the following lines:

http://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3/
https://www.regex101.com/r/aC9tW2/3
https://www.regex101.com/r/aC9tW2/3/

In practice, all these URLs are the same. They're almost the same, but not quite, in theory.

How can I make Notepad remove all occurrences except the first if lines are very, very similar? My hope is to keep line 2 above and delete all other ones, but I'm okay with keeping line 1 only then changing HTTP to HTTPS later.

CodePudding user response:

You may try the following find and replace, in regex mode:

Find:    (https?):\/\/(\S )(?:\s https?:\/\/\2)*
Replace: $1://$2

Demo

The strategy used here is to match:

(https?)               match http or https and capture in $1
:                      :
//                     //
(\S )                  match and capture remainder of URL in $2
(?:\s https?:\/\/\2)*  then match the same URL zero or more subsequent times

We then replace with $1://$2 to replace all duplicates with the first occurrence of the URL.

CodePudding user response:

A short alternative answer that I had copied from stackoverflow's answer is still working ... check it out ..

Find:(\S )([^ ] \1)*

Replace with:$1

  • Related