Consider a system with n_channels
transmitting n_samples
at a given sampling rate. The 1D buffer containing the timestamps and the 2D buffer containing (n_channels, n_samples)
is:
from ctypes import c_double, c_float
# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)()
ts_buffer = (c_double * n_samples)()
I have a C binary library that fills the buffer. The function can be summarized as:
from ctypes import byref
fill_buffers(
byref(data_buffer),
byref(ts_buffer),
)
At this point, I have 2 filled buffers, one with 2048 elements (timestamps) and one with 3* 2048 elements (data). I want to load as efficiently as possible those 2 buffers in a numpy array.
np.frombuffer
seems amazing to read 1D array, e.g. the timestamps
, but I can't find a counterpart for N-dim arrays.
# read from buffer for the 1D array
timestamps = np.frombuffer(ts_buffer) # 192 ns ± 1.11 ns per loop
timestamps = np.array(ts_buffer) # 854 ns ± 2.99 ns per loop
For now, the data array is loaded with:
data = np.array(data_buffer).reshape(-1, n_channels, order="C").T
Any way to use the same efficient method as np.frombuffer
while providing the output shape and the order?
This question is different from How can I initialize a NumPy array from a multidimensional buffer? and from How to restore a 2-dimensional numpy.array from a bytestring? since it does not focus on an alternative to np.frombuffer
, but an alternative as efficient.
EDIT: Why is np.frombuffer(data_buffer).reshape(-1, n_channels).T
not working? With 3 channels and 1024 points (to speed-up my testing), I get len(data_buffer) = 3072
, but:
np.array(data_buffer).reshape(-1, 3).T.size = 3072
np.frombuffer(data_buffer).reshape(-1, 3).T.size = 1536
The application is a LabStreamingLayer buffer. The buffer is filled here https://github.com/labstreaminglayer/liblsl-Python/blob/87276974a311bcf7ceb3383e9d04c6bdcf302771/pylsl/pylsl.py#L854-L861 using the C library https://github.com/sccn/liblsl with specifically this function https://github.com/sccn/liblsl/blob/08aa186326e9a339316b7d5677ef31b3651b4aad/src/lsl_inlet_c.cpp#L180-L185
CodePudding user response:
Does np.frombuffer(data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T
not work correctly? As you are doing it np.array
treats the buffer as a 1D array until you reshape it anyways.
For me the following code produces the right shapes. (Hard to verify if it works correctly without a MWE for the data that should be in the buffers).
import numpy as np
from ctypes import c_double, c_float
# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)() # Note that c_float is typically 32 bytes while c_double and numpy's default is 64 bytes
ts_buffer = (c_double * n_samples)()
# Create a mock buffer
input_data = np.arange(0,n_data_values, dtype=c_float)
input_data_buffer = input_data.tobytes()
timestamps = np.frombuffer(ts_buffer)
# Note to specify the data type for the array of floats
data = np.frombuffer(input_data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T
# data has values 0,1,2 for first time point, 3,4,5 for second, and so on