Reading 8x8 integer matrix csv file Python-CodePudding

I have a csv file generated from another program which looks like this:

45, 133, 148, 213,  65,  26,  22,  73
 84,  51,  41, 249,  25, 167, 102,  72
217, 198, 117, 123, 160,   9, 210, 211
230,  64,  37, 215,  91,  76, 240, 163
123, 169, 197,  16, 225, 160,  68,  65
 89, 247, 170,  88, 173, 206, 158, 235
144, 138, 188, 164,  84,  38,  67,  29
 98,  23, 106, 159,  96,   7,  77,  67
 
142, 140, 240,  56, 176,   0, 131, 160
241, 199,  96, 245, 213, 218,  51,  75
 22, 226,  81, 106,  94, 252, 252, 110
  0,  96, 132,  38, 189, 150, 162, 177
 95, 252, 107, 181,  72,   7,   0, 247
228, 207, 203, 128,  91, 158, 164, 116
 70, 124,  20,  37, 225, 169, 245, 103
103, 229, 186, 108, 151, 170,  18, 168

 52,  86, 244, 244, 150, 181,   9, 146
115,  60,  50, 162,  70, 253,  43,  94
201,  72, 132, 207, 181, 106, 136,  70
 92,   7,  97, 222, 149, 145, 155, 255
 55, 188,  90,  58, 124, 230, 215, 229
231,  60,  48, 150, 179, 247, 104, 162
 45, 241, 178, 122, 149, 243, 236,  92
186, 252, 165, 162, 176,  87, 238,  29

There is always a space following each 8x8 integer matrix.

I need to read each 8x8 matrix into a Python program, generate an operation on it, and then write the result that has the same format. The result will be 8x8 matrix of floats, with space following each 8x8 matrix.

How do I do these 2 things in Python 3.x? I could read the file bit by bit, but perhaps Python has a robust way to do this using small amount of code.

CodePudding user response：

It's actually quite easy to do that with list / generator comprehension. I've spaced out things on multiple lines so it's more readable, but that's a personal preference.

def read_matrices(file):
    with open(file) as f:
        return [
            [
                [
                    float(coeff)
                    for coeff in line.split(",")
                ]
                for line in matrix.split("\n")
                if line.replace(" ", "") != ""
            ]
            for matrix in f.read().split("\n\n")
        ]

def write_matrices(matrices, file):
    text = "\n\n".join(
        "\n".join(
            ",".join(str(coeff) for coeff in line)
            for line in matrix
        )
        for matrix in matrices
    )

    with open(file, "w") as f:
        f.write(text   "\n") # If you want it to be newline-terminated

CodePudding user response：

If you already know that your matrices have 8 rows, you can use pandas.read_csv to load all the data in a numpy array, and just reshape it afterwards.

If you don't know beforehand the number of rows for each matrix, pandas.read_csv will make rows of all NaN for blank lines, which will allows you to infer the number of rows per matrix, and do the reshape:

import numpy as np
import pandas as pd

def read_csv(file, num_rows=None):
    if num_rows is not None:
        df = pd.read_csv(file, header=None, skip_blank_lines=True)
        arr = df.values
    else:
        df = pd.read_csv(file, header=None, skip_blank_lines=False)
        num_rows = extract_matrices_num_rows(df)
        valid_idxs = np.delete(
            np.arange(len(df)), np.arange(num_rows, len(df), num_rows   1)
        )
        arr = df.iloc[valid_idxs].values

    return arr.reshape(-1, num_rows, arr.shape[-1])

def extract_matrices_num_rows(df):
    blank_lines_indices = all_nans_indices(df)
    blank_lines_indices = [-1, *blank_lines_indices, len(df)]
    num_rows = np.diff(blank_lines_indices) - 1
    num_rows = set(num_rows)
    if len(num_rows) > 1:
        raise ValueError(
            f"Matrices detected to have various number of rows: {num_rows}"
        )
    return num_rows.pop()

def all_nans_indices(df):
    return list(df[df.isnull().all(axis=1)].index)

Quick check that it works equally in both cases:

file = "data.csv"

assert np.array_equal(read_csv(file), read_csv(file, num_rows=8))

CodePudding user response：

perhaps Python has a robust way to do this using small amount of code

actualy it has. as an option you can use pandas module. here is an example:

import pandas as pd

df = pd.read_csv('mtrx.csv', header=None, chunksize=9)
for i, matrix in enumerate(df):
    matrix.mul(10**i).fillna('').to_csv('mtrx1.csv', index=False, header=False, mode='a')

this code multiplies each matrix by 10 to the power of i and the result file looks like:

45,133.0,148.0,213.0,65.0,26.0,22.0,73.0
84,51.0,41.0,249.0,25.0,167.0,102.0,72.0
217,198.0,117.0,123.0,160.0,9.0,210.0,211.0
230,64.0,37.0,215.0,91.0,76.0,240.0,163.0
123,169.0,197.0,16.0,225.0,160.0,68.0,65.0
89,247.0,170.0,88.0,173.0,206.0,158.0,235.0
144,138.0,188.0,164.0,84.0,38.0,67.0,29.0
98,23.0,106.0,159.0,96.0,7.0,77.0,67.0
 ,,,,,,,
1420.0,1400.0,2400.0,560.0,1760.0,0.0,1310.0,1600.0
2410.0,1990.0,960.0,2450.0,2130.0,2180.0,510.0,750.0
220.0,2260.0,810.0,1060.0,940.0,2520.0,2520.0,1100.0
0.0,960.0,1320.0,380.0,1890.0,1500.0,1620.0,1770.0
950.0,2520.0,1070.0,1810.0,720.0,70.0,0.0,2470.0
2280.0,2070.0,2030.0,1280.0,910.0,1580.0,1640.0,1160.0
700.0,1240.0,200.0,370.0,2250.0,1690.0,2450.0,1030.0
1030.0,2290.0,1860.0,1080.0,1510.0,1700.0,180.0,1680.0
,,,,,,,
5200,8600,24400,24400,15000,18100,900,14600
11500,6000,5000,16200,7000,25300,4300,9400
20100,7200,13200,20700,18100,10600,13600,7000
9200,700,9700,22200,14900,14500,15500,25500
5500,18800,9000,5800,12400,23000,21500,22900
23100,6000,4800,15000,17900,24700,10400,16200
4500,24100,17800,12200,14900,24300,23600,9200
18600,25200,16500,16200,17600,8700,23800,2900

upd

as for lines with commas it means that those rows in csv file have no data, i.e. empty rows.

CodePudding user response：

Below solution uses Pandas & Numpy. As for example operation, below add 2 to each value of matrix here - [df.values[i:i 8] 2. Output will be same as input format CSV, including blank lines.

import pandas as pd
import numpy as np
df = pd.read_csv('Book2.csv', skip_blank_lines=False, header=None)

updated_metrcies = [np.vstack([df.values[i:i 8] 2,np.repeat(np.nan, df.shape[1])]) for i in range(0, df.shape[0], 9) if i < df.shape[0]]

pd.DataFrame(np.vstack(updated_metrcies)[:-1]).to_csv('Book4.csv', index=False, header=None)