I have a file with a very long list of words, that will look something like this except much much longer with far more words:
Green, Hello, Blue, Pink, Derek, Baby, Orange
Blue, Grey, Yes, Balls, Orange, Ship, Navy
Money, Help, Yellow, Queen, Blue, Pink, Green
What I want to do is remove all the words I want to get rid of, leaving only the words I want to keep, which are the colours. BUT I need to do this by assembling a list of the words I want to keep, NOT the words I want to get rid of.
So let's say I want to keep the words "Green, Blue, Pink, Orange, Grey, Navy, Yellow" and discard the rest, but I want to keep the line structure, after running the replace function I want the file to look like this:
Green, Blue, Pink, Orange
Blue, Grey, Orange, Navy
Yellow, Blue, Pink, Green
I can't do this word by word as the file is far too long with far too many different words to get rid of, I want to just tell Notepad which words I want to keep and discard the rest. Does anybody know how I would achieve this?
CodePudding user response:
I can only think of a regular expression where you need to list the same words twice:
1. If the list of words to keep is relatively small:
Find what: (, )(?!(Green|Blue|Orange)\b)\w |\b(?!(Green|Blue|Orange)\b)\w (, )?
Replace with: (empty)
⦿ Regular expression
Replace All
2. If the list of words to remove is relatively small:
Find what: (, )(Hello|Derek|Baby|Yes|Balls)\b|\b(Hello|Derek|Baby|Yes|Balls)(, )?
Replace with: (empty)
⦿ Regular expression
Replace All
CodePudding user response:
I can suggest this pattern:
\b(?!Green|Blue|Orange|Grey|Navy|Yellow|Pink)\w \W*
This pattern use Negative lookahead , which is supported by Notepad search-and-replace engine.
Replace any match with empty string. The problem is that result might not be clean - some lines may have trailing comma and space symbols. But you can rid of them easily with another replace operation.
CodePudding user response:
(?!Hello|Derek|Baby|Yes|Balls\b)\b\w
that is work to you @Chris H