Numpy in-place type casting-CodePudding

I have a numpy array phase of floats dtype=np.float32 that I convert to integers out ,dtype=np.uint8. Since speed is an issue, this should happen in-place.

I work with code from a previous student and the code doesn't work

phase = np.arange(0, 4, dtype=np.float32).reshape(2, 2)

out = np.empty((2, 2), dtype=np.uint8)

# Prepare the 2pi -> integer conversion factor and convert.
factor = -(256 / 2 / np.pi)
phase *= factor

print("array phase with dtype float \n ", phase)

# There is some randomness involved in casting positive floats to integers.
# Avoid this by going all negative.
maximum = np.amax(phase)
if maximum >= 0:
    toshift = 256 * 2 * np.ceil(maximum / 256)
    phase -= toshift

# Copy and cast the data to the output
np.copyto(out, phase, casting="unsafe")
print("phase array dtype unsigned integer", out)


# This part (along with the choice of type), implements modulo much faster than np.mod().
bw = int(256 - 1)
np.bitwise_and(out, bw, out=out)
print("array module bit depth \n", out)

The output is

array phase with dtype float 
  [[  -0.      -162.97466]
 [-325.9493  -488.92395]]
phase array dtype unsigned integer [[  0  94]
 [187  24]]
array module bit depth 
 [[  0  94]
 [187  24]]

Executing this program yields results that I don't understand:

Why does e.g. -162 get mapped to 94?
I am aware of the flag casting=unsafe but it is required to to in-place conversion.
I am also aware that 300 > 256 and hence the np.uint8 data-type is too small. I guess i should increase it to np.uint16?
Why is there some randomness involved when casting positive floats to integer?

I have also tried np.astype(np.uint8) but the results are similarly disappointing.

CodePudding user response：

Since speed is an issue, this should happen in-place.

In-place operations are not always necessary faster. This is dependent of the target platform and the way Numpy is compiled (a lot of low-level effects needs to be considered). They are generally not slower though. Reusing buffers is sufficient in some cases (to avoid page-faults). Did you profile your code and found this to be a bottleneck?

Why does e.g. -162 get mapped to 94?

This is because the range of the destination type (0..255 included) does not supports the number -162 nor any negative numbers actually since it is an unsigned integer of 8 bits. As a result, a wraparound happens : 256-162=94. That being said, AFAIK, doing this cause an undefined behaviour. The result from one platform to another can change (and actually did so based on past Numpy questions and issues). Thus, I strongly advise to use a bigger type or to change your code so the values fit in the target output type range.

I am aware of the flag casting=unsafe but it is required to to in-place conversion.

casting=unsafe is pretty explicit. It basically means : "I know exactly what I am doing and accept the risks and the consequence". Use it at your own risk ;) .

I am also aware that 300 > 256 and hence the np.uint8 data-type is too small. I guess i should increase it to np.uint16?

Since numbers are negative, you should rather use np.int16 instead. Beside this, yes, this is a good idea.

Why is there some randomness involved when casting positive floats to integer?

It is not really random. Such operation is deterministic, but the result is dependent of the target platform and the input numbers (and possibly the low-level state of the processor regarding the specific target platform). In practice, as long as the input numbers fit in the target range and there is no special numbers like NaN, Inf, -Inf values, it should be fine.

I have also tried np.astype(np.uint8) but the results are similarly disappointing.

This is normal. The problem is the same and the same conversion function is called in both cases.

Note the operation you do is not really an in-place operation, except the np.bitwise_and(out, bw, out=out). That being said, it is useless for an np.uint8 type since the range is bounded to 255 anyway.

implements modulo much faster than np.mod()

This is true for positive number but not for negative numbers. For negative numbers, this is dependent of the underlying representation of integers on the target platform. This does not work for processors using the C1 representation. That being said, all mainstream processors use the C2 representation these days.