Home > database >  Use Notepad to remove every word, except those in a list, keeping the line structure
Use Notepad to remove every word, except those in a list, keeping the line structure

Time:06-17

I have a file with a very long list of words, that will look something like this except much much longer with far more words:

Green, Hello, Blue, Pink, Derek, Baby, Orange

Blue, Grey, Yes, Balls, Orange, Ship, Navy

Money, Help, Yellow, Queen, Blue, Pink, Green

What I want to do is remove all the words I want to get rid of, leaving only the words I want to keep, which are the colours. BUT I need to do this by assembling a list of the words I want to keep, NOT the words I want to get rid of.

So let's say I want to keep the words "Green, Blue, Pink, Orange, Grey, Navy, Yellow" and discard the rest, but I want to keep the line structure, after running the replace function I want the file to look like this:

Green, Blue, Pink, Orange

Blue, Grey, Orange, Navy

Yellow, Blue, Pink, Green

I can't do this word by word as the file is far too long with far too many different words to get rid of, I want to just tell Notepad which words I want to keep and discard the rest. Does anybody know how I would achieve this?

CodePudding user response:

I can only think of a regular expression where you need to list the same words twice:

1. If the list of words to keep is relatively small:

Find what: (, )(?!(Green|Blue|Orange)\b)\w |\b(?!(Green|Blue|Orange)\b)\w (, )?
Replace with: (empty)
⦿ Regular expression

Replace All

2. If the list of words to remove is relatively small:

Find what: (, )(Hello|Derek|Baby|Yes|Balls)\b|\b(Hello|Derek|Baby|Yes|Balls)(, )?
Replace with: (empty)
⦿ Regular expression

Replace All

CodePudding user response:

I can suggest this pattern:

\b(?!Green|Blue|Orange|Grey|Navy|Yellow|Pink)\w \W*

Regex101 test page

This pattern use Negative lookahead , which is supported by Notepad search-and-replace engine.

Replace any match with empty string. The problem is that result might not be clean - some lines may have trailing comma and space symbols. But you can rid of them easily with another replace operation.

CodePudding user response:

(?!Hello|Derek|Baby|Yes|Balls\b)\b\w

that is work to you @Chris H

  • Related