I know the title is confusing but i don't know how to describe it better, let code explains itself:
I have a third-party library defines complex scalar as
typedef struct {
float real;
float imag;
} cpx;
so complex array/vector is like
cpx array[10];
for (int i = 0; i < 10; i )
{
/* array[i].real and array[i].imag is real/imag part of i-th member */
}
current situation is, in a function I have two float array as arguments, I use two temporarily local complex array like:
void my_func(float *x, float *y) /* x is input, y is output, length is fixed, say 10 */
{
cpx tmp_cpx_A[10]; /* two local cpx array */
cpx tmp_cpx_B[10];
for (int i = 0; i < 10; i ) /* tmp_cpx_A is based on input x */
{
tmp_cpx_A[i].real = do_some_calculation(x[i]);
tmp_cpx_A[i].imag = do_some_other_calculation(x[i]);
}
some_library_function(tmp_cpx_A, tmp_cpx_B); /* tmp_cpx_B is based on tmp_cpx_A, out-of-place */
for (int i = 0; i < 10; i ) /* output y is based on tmp_cpx_B */
{
y[i] = do_final_calculation(tmp_cpx_B[i].real, tmp_cpx_B[i].imag);
}
}
I notice that after first loop x
is useless, and second loop is in-place. If I can build tmp_cpx_B
with same memory as x
and y
, I can save half of intermediate memory usage.
If the complex array is defined as
typedef struct{
float *real;
float *imag;
} cpx_alt;
then I can simply
cpx_alt tmp_cpx_B;
tmp_cpx_B.real = x;
tmp_cpx_B.imag = y;
and do the rest, but it is not.
I cannot change the definition of third library complex structure, and cannot take cpx
as input because I want to hide internal library to outside user and not to break API.
So I wonder if it it possible to initialize struct array with scalar member like cpx
with scalar array like x
and y
Edit 1: for some common ask question:
- in practice the array length is up to 960, which means one
tmp_cpx
array will take 7680 bytes. And my platform have total 56k RAM, save onetmp_cpx
will save ~14% memory usage. - the 3rd party library is kissFFt and do FFT on complex array, it define its own
kiss_fft_cpx
instead of standard <complex.h> because it can use marco to switch bewteen floating/fixed point calculation
CodePudding user response:
First of all, please note that C has a standardized library for complex numbers, <complex.h>
. You might want to use that one instead of some non-standard 3rd party lib.
The main problem with your code might be execution speed, not memory usage. Allocating 2 * 10 * 2 = 40
floats isn't a big deal on most systems. On the other hand, you touch the same memory area over and over again. This might be needlessly inefficient.
Consider something like this instead:
void my_func (size_t size, const float x[size], float y[size])
{
for(size_t i=0; i<size; i )
{
cpx cpx_A =
{
.real = do_some_calculation(x[i]),
.imag = do_some_other_calculation(x[i])
};
cpx cpx_B;
// ensure that the following functions work on single variables, not arrays:
some_library_function(&cpx_A, &cpx_B);
y[i] = do_final_calculation(cpx_B.real, cpx_B.imag);
}
}
Less instructions and less branching. And as a bonus, less stack usage.
In theory you might also gain a few CPU cycles by restrict
qualifying the parameters, though I didn't spot any improvement when I tried that on this code (gcc x86-64).
CodePudding user response:
If you want standard compliant code, you can't reuse the memory pointed to by x
and y
to hold an array of cpx
with the same dimension as the x
/y
arrays. There are several problems with that approach. The size of the x
array plus size of the y
array may not equal size of cpx
array. The x
and y
arrays may not be in consecutive memory. Pointer type punning is not guaranteed to work by the C standard.
So the short answer is: No, you can't
However, if you are willing to accept code that isn't 100% standard compliant, it's very likely that in can be done. You'll have to check it very carefully on your specific system and accept that you can't move the code to another system without again checking it very carefully on that system (note: by system I mean cpu, compiler and it's version and so on).
There are some things you need to ensure
That the
x
andy
arrays are consecutive in memoryThat the
cpx
array has the same size as the two other arrays.That alignment is ok
If that holds true, you can go for a non-standard type punning. Like:
#define SIZE 10
// Put x and y into a struct
typedef struct {
float x[SIZE];
float y[SIZE];
} xy_t;
Add some asserts to check that the memory layout is without any padding.
assert(sizeof(xy_t) == 2 * SIZE * sizeof(float));
assert(sizeof(cpx) == 2 * sizeof(float));
assert(sizeof(cpx[SIZE]) == sizeof(xy_t));
assert(alignof(cpx[SIZE]) == alignof(xy_t));
In my_func
change
cpx tmp_cpx_A[SIZE];
cpx tmp_cpx_B[SIZE];
to
cpx tmp_cpx_A[SIZE];
cpx* tmp_cpx_B = (cpx*)x; // Ugly, non-portable type punning
This is the "dangerous" part. Instead of defining a new array, type punning through pointer casting is used so that tmp_cpx_B
points to the same memory as x
(and y
). This is not standard compliant but on most systems it's likely to work when the above assertions hold.
Now call the function like:
xy_t xt;
for (int i = 0; i < SIZE; i )
{
xt.x[i] = i;
}
my_func(xt.x, xt.y);
End note As pointed out several times, this approach is not standard compliant. So you should only do this kind of stuff if you really, really need to reduce your memory usage. And you need to check your specific system to make sure it will work an your system.