I have the following problem that can't find a solution for:
A .CSV file contains data for multiple 2D arrays as follows:
# Date:20221027-151458
# Array shape (number, width, height): 3, 4, 4)
# Some comments about the data
# some more
1; 2; 3; 4
5; 6; 7; 8
9; 10; 11; 12
#new slice
20; 21; 23; 24
25; 26; 27; 28
29; 30; 31; 32
#new slice
100; 101; 102; 103
104; 105; 106; 107
108; 109; 110; 111
#new slice
1000; 1001; 1002; 1003
1004; 1005; 1006; 1007
1008; 1009; 1010; 1011
My goal is to read out the CSV into an 3D-array, every matrix between the "#new slice"-comment into an new array in the third dimension.
Edit: The result should look like this:
irdata([[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]],
[[20, 21, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32]],
[[100, 101, 102, 103],
[104, 105, 106, 107],
[108, 109, 110, 111]],
[[1000, 1001, 1002, 1003],
[1004, 1005, 1006, 1007],
[1008, 1009, 1010, 1011]]])
Can you help me find a way to do this?
Very best
Christian
I tried using numpy.loadtxt is giving me the whole dataset as a 2D-array (in this example an (100, 10) array), using pandas gives me a 2d-array as well, but with the comments included.
CodePudding user response:
You can try:
text = '''# Date:20221027-151458
# Array shape (number, width, height): 3, 4, 4)
# Some comments about the data
# some more
1; 2; 3; 4
5; 6; 7; 8
9; 10; 11; 12
#new slice
20; 21; 23; 24
25; 26; 27; 28
29; 30; 31; 32
#new slice
100; 101; 102; 103
104; 105; 106; 107
108; 109; 110; 111
#new slice
1000; 1001; 1002; 1003
1004; 1005; 1006; 1007
1008; 1009; 1010; 1011
'''
import re
a = (np.dstack([np.vstack([np.fromstring(l, sep=';', dtype='int') for l in s.strip().split('\n')])
for s in re.split(r'#.*\n(?=\d)', text)[1:]])
.T.swapaxes(1,2)
)
a.shape
# (4, 3, 4)
output:
array([[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]],
[[ 20, 21, 23, 24],
[ 25, 26, 27, 28],
[ 29, 30, 31, 32]],
[[ 100, 101, 102, 103],
[ 104, 105, 106, 107],
[ 108, 109, 110, 111]],
[[1000, 1001, 1002, 1003],
[1004, 1005, 1006, 1007],
[1008, 1009, 1010, 1011]]])