I have a text file the contains a table like the following:
---
Title of my file
Subtitle of my file
---
------ ------------------- ------
| a | aa | aaa |
| b | bb | bbb |
| c | cc | ccc |
| d | dd | ddd | # Section 1
| e | ee | eee |
| f | ff | fff |
====== =================== ======
| g | gg | ggg |
| h | hh | hhh |
| i | ii | iii | # Section 2
| j | jj | jjj |
| k | kk | kkk |
| l | ll | lll |
------ ------------------- ------
And I'm trying parse with python to capture each section into a separate list, section1_list
and section_2_list
, with each list containinng the lines in the section. For example, section_1_list
would be:
section_1_list = [
"| a | aa | aaa |",
"| b | bb | bbb |",
"| c | cc | ccc |",
"| d | dd | ddd |",
"| e | ee | eee |",
"| f | ff | fff |"
]
Notice that this is without the diving lines.
So my question is: how can I write my loop so that that I can ignore the dividing lines and gather the others into their own list?
**What I have tried:
Extract Values between two strings in a text file using python
Python read specific lines of text between two strings
**What I currently have:
with open(txt_file_path) as f:
lines = f.readlines()
row_start = False
for line in lines:
if "-----" in line or "=====" in line:
block_text = []
row_start = not row_start
while row_start == True:
block_text.append(line)
Edit: I say repeatedly in the title because I have around 16 of these blocks in the text file.
CodePudding user response:
Try the following approach.
- Read the contents of the file.
- Replace the first and last lines of the table (using re)
- Split the data based on the line separators in the table (using re)
See the following code:
import re
with open(txt_file_path,"r") as f:
data = f.read()
data = re.sub(r"[- ] ","",data)
block_text = re.split(r"[ =] ",data)
CodePudding user response:
Here's how I would do:
from pprint import pprint
file_contents = """\
---
Title of my file
Subtitle of my file
---
------ ------------------- ------
| a | aa | aaa |
| b | bb | bbb |
| c | cc | ccc |
| d | dd | ddd | # Section 1
| e | ee | eee |
| f | ff | fff |
====== =================== ======
| g | gg | ggg |
| h | hh | hhh |
| i | ii | iii | # Section 2
| j | jj | jjj |
| k | kk | kkk |
| l | ll | lll |
------ ------------------- ------ \
"""
lines = file_contents.split('\n')
# TODO update as needed
start_end_line_prefixes = (' ---', ' ===')
sections = []
curr_section = None
for line in lines:
if any(line.startswith(prefix) for prefix in start_end_line_prefixes):
curr_section = []
sections.append(curr_section)
elif curr_section is not None:
curr_section.append(line)
# Remove empty list in last index (if needed)
if not sections[-1]:
sections.pop()
pprint(sections)
Output:
[['| a | aa | aaa |',
'| b | bb | bbb |',
'| c | cc | ccc |',
'| d | dd | ddd | # Section 1',
'| e | ee | eee |',
'| f | ff | fff |'],
['| g | gg | ggg |',
'| h | hh | hhh |',
'| i | ii | iii | # Section 2',
'| j | jj | jjj |',
'| k | kk | kkk |',
'| l | ll | lll |']]