I have list of phone number (25million) I want to use that list as input file. Lets say that I have email phone database and I want to only extract phone number that available in input file(25 million) How can I do that in em editor? Or in any large file?
CodePudding user response:
To extract all matched string
Suppose you have a 25 million phone number list (file A
) and a phone-email database file (file B
).
- Open
file B
, use a regular expression to extract phone numbers only. To do this, press Ctrl F to bring up the Find dialog box, set the Regular Expressions option, and depending on the phone number format, enter[0-9]{3,3}-[0-9]{3,3}-[0-9]{4,4}
or\([0-9]{3,3}\)[0-9]{3,3}-[0-9]{4,4}
to theFind
box. ClickExtract
button, and save the phone numbers only database as a new file (file C
). - Open
file A
and selectTab
(or any CSV format) in theCSV
toolbar (or select theEdit
menu -CSV
-Tab separated
). - Open
file C
and select the same CSV format as 2. - Click
Join CSV
button on theSort
toolbar (or selectEdit
menu -CSV
-Join
) to bring up the Join CSV dialog box. - Select
A.txt
and set theUnique Key
option as CSV Document 1, and selectC.txt
and set theUnique Key
option as CSV Document 2. - Select
Whole strings match
as theConditions
, and set theMatch Case
option. - Deselect
A.txt
from the list box, and ensureC.txt
is selected. - Click
Join Now
. A new document will be created with all matched strings. Save this file asfile D
.
To extract all matched lines
If file D
is small enough, you can use Advanced Filter to filter file B
with the contents of file D
.
- Copy the
file D
contents to the Clipboard. (To do this, Openfile D
with EmEditor, press CTRL A, and CTRL C) - Open
file B
with EmEditor, click Advanced Filter on the Filter toolbar. - Right-click on the list box, and paste the Clipboard contents.
- While all items in the list box are selected, make sure the Logical Disjunction (OR) to the Previous Condition option is set.
- Click the Filter button, and click the Close button if necessary.
- Click the Extract All button to extract all matched lines to a new document.