Suppose you have a text file with a bunch of contacts information, (e.g. name, phone, email etc. ) for multiple people.
Using the example information below and python (and perhaps regex), how would you extract the bundled information for each person and write it to an new file? The result would be that you have a separate text file for each person with only their info...
So, the input text looks something like this:
name:: Joe Blogs
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah
name:: Josephine Blogs
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah
name:: John Smith
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah
CodePudding user response:
If the file easily fits in memory, you can simply split on \n\n
:
with open('file.txt') as f:
for n, chunk in enumerate(f.read().split('\n\n'), start=1):
with open(f'chunk_{n}.txt', 'w') as f_out:
f_out.write(chunk)
file.txt
:
name:: Joe Blogs
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah
name:: Josephine Blogs
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah
name:: John Smith
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah
output files:
chunk_1.txt
name:: Joe Blogs
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah
chunk_2.txt
name:: Josephine Blogs
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah
chunk_3.txt
name:: John Smith
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah
For larger files, you can use a more classical loop, adding the lines one by one in the new files, changing file when you encounter a blank line:
with open('file.txt') as f:
n = 1
f_out = open(f'chunk_{n}.txt', 'w')
for line in f:
if line.strip():
f_out.write(line)
else:
f_out.close()
n =1
f_out = open(f'chunk_{n}.txt', 'w')
f_out.close()
output:
==> chunk_1.txt <==
name:: Joe Blogs
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah
==> chunk_2.txt <==
name:: Josephine Blogs
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah
==> chunk_3.txt <==
name:: John Smith
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah
CodePudding user response:
Here's another neat implementation with regex.
import re
pattern = re.compile(r'name::\s*(?P<name>. ?)\nphone::\s*(?P<phone>\d )\nemail::\s*(?P<email>. ?)\naddress::(?P<address>.*?)\nnote::\s*(?P<note>.*?)')
with open('/original_file.txt') as f:
people = pattern.finditer(f.read())
for person in people:
person_info = person.groupdict()
with open(f'person_{person_info.get("name")}.txt', 'w ') as w:
for key, value in person_info.items():
print(f'{key}:: {value}\n', file=w)
This approach matches all lines of the file with a single regex pattern. The regex pattern parses all of the individual fields, so you can also pick every one of these fields like this person_info.get('phone')
.