Home > OS >  Delete the Repeated lines/Duplicate lines in a file using Regex
Delete the Repeated lines/Duplicate lines in a file using Regex

Time:11-10

messaging-1.8.2.5000.jar
m-common-1.8.2.5000.jar
validation-api-2.0.1.Final.jar
jboss-logging-3.3.2.Final.jar
classmate-1.3.4.jar
tomcat-servlet-api-10.0.6.jar
picketbox-5.1.0.Final.jar
jboss-security-spi-5.1.0.Final.jar
acl-spi-5.1.0.Final.jar
authorization-spi-5.1.0.Final.jar
common-spi-5.1.0.Final.jar
identity-spi-5.1.0.Final.jar
picketbox-spi-bare-5.1.0.Final.jar
jboss-jaspi-api_1.1_spec-1.0.0.Final.jar
jboss-servlet-api_3.1_spec-1.0.0.Final.jar
jbosssx-5.1.0.Final.jar
common-spi-5.1.0.Final.jar
validation-api-2.0.1.Final.jar
jbosssx-bare-5.1.0.Final.jar
jbossxacml-2.0.8.Final.jar
common-spi-5.1.0.Final.jar
jboss-connector-api_1.6_spec-1.0.0.Final.jar
jboss-jacc-api_1.5_spec-1.0.1.Final.jar
picketbox-commons-1.0.0.final.jar
jboss-jacc-api_1.5_spec-1.0.1.Final.jar
picketbox-acl-impl-5.1.0.Final.jar
hibernate-core-5.4.24.Final.jar
javax.persistence-api-2.2.jar
jboss-jacc-api_1.5_spec-1.0.1.Final.jar
common-spi-5.1.0.Final.jar
antlr-2.7.7.jar
validation-api-2.0.1.Final.jar
common-spi-5.1.0.Final.jar
jboss-transaction-api_1.2_spec-1.1.1.Final.jar
common-spi-5.1.0.Final.jar
jandex-2.1.3.Final.jar

I have a file which contains 2k jar names, most of them are repeated jar names. So i would like to keep the first occurance and delete the duplicate occurances of any jar. There are 2, 3 or 4 occurances of same jar names as well.

Is there a way we can construct Regex to achive the above goal?

Thanks in advance.

CodePudding user response:

A regex solution could be to run this search replace combo multiple times:

Search

(^.*$)([\s\S]*)\1[\r\n]*
  • (^.*$) - capture a line, any line, I don't care; into capture group #1
  • ([\s\S]*) - capture everything else into capture group #2
  • \1[\r\n]* - find duplicate data which we captured in capture group #1 and any newlines after it

Replace

$1$2
  • $1 - keep the original data line
  • $2 - keep the data leading up to the duplicated data

The duplicated data is simply not retained.


If you're open to a software solution then it's quite trivial with Notepad .

Edit -> Line Operations -> Remove Duplicate Lines

enter image description here

enter image description here

  • Related