Home > Net >  Using python, how do you repeatedly extract regularly formated chunks of text from .txt file, and wr
Using python, how do you repeatedly extract regularly formated chunks of text from .txt file, and wr

Time:07-15

Suppose you have a text file with a bunch of contacts information, (e.g. name, phone, email etc. ) for multiple people.

Using the example information below and python (and perhaps regex), how would you extract the bundled information for each person and write it to an new file? The result would be that you have a separate text file for each person with only their info...

So, the input text looks something like this:

name:: Joe Blogs 
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah

name:: Josephine Blogs 
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah

name:: John Smith 
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah

CodePudding user response:

If the file easily fits in memory, you can simply split on \n\n:

with open('file.txt') as f:
    for n, chunk in enumerate(f.read().split('\n\n'), start=1):
        with open(f'chunk_{n}.txt', 'w') as f_out:
            f_out.write(chunk)

file.txt:

name:: Joe Blogs 
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah

name:: Josephine Blogs 
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah

name:: John Smith 
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah

output files:

chunk_1.txt

name:: Joe Blogs 
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah

chunk_2.txt

name:: Josephine Blogs 
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah

chunk_3.txt

name:: John Smith 
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah

For larger files, you can use a more classical loop, adding the lines one by one in the new files, changing file when you encounter a blank line:

with open('file.txt') as f:
    n = 1
    f_out = open(f'chunk_{n}.txt', 'w')
    for line in f:
        if line.strip():
            f_out.write(line)
        else:
            f_out.close()
            n =1
            f_out = open(f'chunk_{n}.txt', 'w')
    f_out.close()

output:

==> chunk_1.txt <==
name:: Joe Blogs 
phone:: 123456789
email:: [email protected]
address:: 123 Main Street
note:: blah blah blah

==> chunk_2.txt <==
name:: Josephine Blogs 
phone:: 43217890
email:: [email protected]
address:: 123 Main Street
note:: More blah blah

==> chunk_3.txt <==
name:: John Smith 
phone:: 23498689
email:: [email protected]
address:: 1 North Street
note:: Some more blah

CodePudding user response:

Here's another neat implementation with regex.

import re
pattern = re.compile(r'name::\s*(?P<name>. ?)\nphone::\s*(?P<phone>\d )\nemail::\s*(?P<email>. ?)\naddress::(?P<address>.*?)\nnote::\s*(?P<note>.*?)')

with open('/original_file.txt') as f:
    people = pattern.finditer(f.read())
    for person in people:
        person_info = person.groupdict()
        with open(f'person_{person_info.get("name")}.txt', 'w ') as w:
            for key, value in person_info.items():
                print(f'{key}:: {value}\n', file=w)

This approach matches all lines of the file with a single regex pattern. The regex pattern parses all of the individual fields, so you can also pick every one of these fields like this person_info.get('phone').

  • Related