Create 2D numpy array from buffer-CodePudding

Consider a system with n_channels transmitting n_samples at a given sampling rate. The 1D buffer containing the timestamps and the 2D buffer containing (n_channels, n_samples) is:

from ctypes import c_double, c_float

# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)()
ts_buffer = (c_double * n_samples)()

I have a C binary library that fills the buffer. The function can be summarized as:

from ctypes import byref

fill_buffers(
    byref(data_buffer),
    byref(ts_buffer),
)

At this point, I have 2 filled buffers, one with 2048 elements (timestamps) and one with 3* 2048 elements (data). I want to load as efficiently as possible those 2 buffers in a numpy array.

np.frombuffer seems amazing to read 1D array, e.g. the timestamps, but I can't find a counterpart for N-dim arrays.

# read from buffer for the 1D array
timestamps = np.frombuffer(ts_buffer)  # 192 ns ± 1.11 ns per loop
timestamps = np.array(ts_buffer)  # 854 ns ± 2.99 ns per loop

For now, the data array is loaded with:

data = np.array(data_buffer).reshape(-1, n_channels, order="C").T

Any way to use the same efficient method as np.frombuffer while providing the output shape and the order?

This question is different from How can I initialize a NumPy array from a multidimensional buffer? and from How to restore a 2-dimensional numpy.array from a bytestring? since it does not focus on an alternative to np.frombuffer, but an alternative as efficient.

EDIT: Why is np.frombuffer(data_buffer).reshape(-1, n_channels).T not working? With 3 channels and 1024 points (to speed-up my testing), I get len(data_buffer) = 3072, but:

np.array(data_buffer).reshape(-1, 3).T.size = 3072
np.frombuffer(data_buffer).reshape(-1, 3).T.size = 1536

The application is a LabStreamingLayer buffer. The buffer is filled here https://github.com/labstreaminglayer/liblsl-Python/blob/87276974a311bcf7ceb3383e9d04c6bdcf302771/pylsl/pylsl.py#L854-L861 using the C library https://github.com/sccn/liblsl with specifically this function https://github.com/sccn/liblsl/blob/08aa186326e9a339316b7d5677ef31b3651b4aad/src/lsl_inlet_c.cpp#L180-L185

CodePudding user response：

Does np.frombuffer(data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T not work correctly? As you are doing it np.array treats the buffer as a 1D array until you reshape it anyways.

For me the following code produces the right shapes. (Hard to verify if it works correctly without a MWE for the data that should be in the buffers).

import numpy as np
from ctypes import c_double, c_float

# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)() # Note that c_float is typically 32 bytes while c_double and numpy's default is 64 bytes
ts_buffer = (c_double * n_samples)()

# Create a mock buffer

input_data = np.arange(0,n_data_values, dtype=c_float)
input_data_buffer = input_data.tobytes()


timestamps = np.frombuffer(ts_buffer) 

# Note to specify the data type for the array of floats
data = np.frombuffer(input_data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T
# data has values 0,1,2 for first time point, 3,4,5 for second, and so on