Reading txt file with more than one space as a delimiter in Python-CodePudding

I have a text file in which columns are separated by more than one space. The problem is that values in each column can also by separated, but at most with only one space. So it may look like this

aaaxx   123 A   xyz   456 BB 
zcbb  a b   XYZ   xtz 1 
cdddtr  a  111  tddw

Is there any way to read such a table? I've tried a few approaches and I think I have to use some kind of regular expression for a delimiter, but honestly I have no idea how to resolve this.

CodePudding user response：

Other solution, using pandas:

import pandas as pd

df = pd.read_csv("your_file.txt", sep=r"\s{2,}", engine="python", header=None)
print(df)

Prints:

        0      1    2       3
0   aaaxx  123 A  xyz  456 BB
1    zcbb    a b  XYZ   xtz 1
2  cdddtr      a  111    tddw

CodePudding user response：

You probably want to use a regexp

import re

content = """aaaxx   123 A   xyz   456 BB 
zcbb  a b   XYZ   xtz 1 
cdddtr  a  111  tddw
"""

# Split the content on new lines
rows = content.split("\n")

# Create a 2D list (table) out of the values
table = []

for row in rows:
    row_arr = []
    # The "[ ]" is the regexp equivalent of "space" and {2,} means 2 
    for column in re.split("[ ]{2,}", row):
    # If the row is empty, don't add it to the table
    if len(row_arr):
        table.append(row_arr)

print(table)

CodePudding user response：

Here are two implementations that I would use. They are based on parity: the split by two spaces keeps the values separated by a single space together, the values separated by an even number of spaces are correctly split, and the uneven cases are cleaned with the strip method. The remaning empty strings are filtered out.

content = """aaaxx   123 A   xyz   456 BB 
zcbb  a b   XYZ   xtz 1 
cdddtr  a  111  tddw"""


def split_file_content(file_content: str) -> list[list[str]]:
    """If you don't like regex"""
    return [
        [part.strip() for part in row.split("  ") if part]
        for row in file_content.split("\n")
    ]


def split_file_content_loops(file_content: str) -> list[list[str]]:
    """If you don't like regex AND list comprehensions"""
    table = []
    for row in file_content.split("\n"):
        values = []
        for part in row.split("  "):
            if part:
                values.append(part.strip())
        table.append(values)
    return table


print(split_file_content(content))
print(split_file_content_loops(content))