Home > Back-end >  Recognizing a new array when reading a CSV into an 3D array with Python
Recognizing a new array when reading a CSV into an 3D array with Python

Time:11-03

I have the following problem that can't find a solution for:

A .CSV file contains data for multiple 2D arrays as follows:

# Date:20221027-151458
# Array shape (number, width, height): 3, 4, 4)
# Some comments about the data
# some more
1; 2; 3; 4
5; 6; 7; 8
9; 10; 11; 12
#new slice
20; 21; 23; 24
25; 26; 27; 28
29; 30; 31; 32
#new slice
100; 101; 102; 103
104; 105; 106; 107
108; 109; 110; 111
#new slice
1000; 1001; 1002; 1003
1004; 1005; 1006; 1007
1008; 1009; 1010; 1011

My goal is to read out the CSV into an 3D-array, every matrix between the "#new slice"-comment into an new array in the third dimension.

Edit: The result should look like this:

irdata([[[1, 2, 3, 4],
         [5, 6, 7, 8],
         [9, 10, 11, 12]],

        [[20, 21, 23, 24],
         [25, 26, 27, 28],
         [29, 30, 31, 32]],

        [[100, 101, 102, 103],
         [104, 105, 106, 107],
         [108, 109, 110, 111]],

        [[1000, 1001, 1002, 1003],
         [1004, 1005, 1006, 1007],
         [1008, 1009, 1010, 1011]]])

Can you help me find a way to do this?

Very best

Christian

I tried using numpy.loadtxt is giving me the whole dataset as a 2D-array (in this example an (100, 10) array), using pandas gives me a 2d-array as well, but with the comments included.

CodePudding user response:

You can try:

text = '''# Date:20221027-151458
# Array shape (number, width, height): 3, 4, 4)
# Some comments about the data
# some more
1; 2; 3; 4
5; 6; 7; 8
9; 10; 11; 12
#new slice
20; 21; 23; 24
25; 26; 27; 28
29; 30; 31; 32
#new slice
100; 101; 102; 103
104; 105; 106; 107
108; 109; 110; 111
#new slice
1000; 1001; 1002; 1003
1004; 1005; 1006; 1007
1008; 1009; 1010; 1011
'''

import re

a = (np.dstack([np.vstack([np.fromstring(l, sep=';', dtype='int') for l in s.strip().split('\n')])
                for s in re.split(r'#.*\n(?=\d)', text)[1:]])
       .T.swapaxes(1,2)
    )

a.shape
# (4, 3, 4)

output:

array([[[   1,    2,    3,    4],
        [   5,    6,    7,    8],
        [   9,   10,   11,   12]],

       [[  20,   21,   23,   24],
        [  25,   26,   27,   28],
        [  29,   30,   31,   32]],

       [[ 100,  101,  102,  103],
        [ 104,  105,  106,  107],
        [ 108,  109,  110,  111]],

       [[1000, 1001, 1002, 1003],
        [1004, 1005, 1006, 1007],
        [1008, 1009, 1010, 1011]]])
  • Related