I have a text file in which columns are separated by more than one space. The problem is that values in each column can also by separated, but at most with only one space. So it may look like this
aaaxx 123 A xyz 456 BB
zcbb a b XYZ xtz 1
cdddtr a 111 tddw
Is there any way to read such a table? I've tried a few approaches and I think I have to use some kind of regular expression for a delimiter, but honestly I have no idea how to resolve this.
CodePudding user response:
Other solution, using pandas
:
import pandas as pd
df = pd.read_csv("your_file.txt", sep=r"\s{2,}", engine="python", header=None)
print(df)
Prints:
0 1 2 3
0 aaaxx 123 A xyz 456 BB
1 zcbb a b XYZ xtz 1
2 cdddtr a 111 tddw
CodePudding user response:
You probably want to use a regexp
import re
content = """aaaxx 123 A xyz 456 BB
zcbb a b XYZ xtz 1
cdddtr a 111 tddw
"""
# Split the content on new lines
rows = content.split("\n")
# Create a 2D list (table) out of the values
table = []
for row in rows:
row_arr = []
# The "[ ]" is the regexp equivalent of "space" and {2,} means 2
for column in re.split("[ ]{2,}", row):
# If the row is empty, don't add it to the table
if len(row_arr):
table.append(row_arr)
print(table)
CodePudding user response:
Here are two implementations that I would use. They are based on parity: the split by two spaces keeps the values separated by a single space together, the values separated by an even number of spaces are correctly split, and the uneven cases are cleaned with the strip
method. The remaning empty strings are filtered out.
content = """aaaxx 123 A xyz 456 BB
zcbb a b XYZ xtz 1
cdddtr a 111 tddw"""
def split_file_content(file_content: str) -> list[list[str]]:
"""If you don't like regex"""
return [
[part.strip() for part in row.split(" ") if part]
for row in file_content.split("\n")
]
def split_file_content_loops(file_content: str) -> list[list[str]]:
"""If you don't like regex AND list comprehensions"""
table = []
for row in file_content.split("\n"):
values = []
for part in row.split(" "):
if part:
values.append(part.strip())
table.append(values)
return table
print(split_file_content(content))
print(split_file_content_loops(content))