Home > front end >  numpy Array Error: Summing elements gives wrong output
numpy Array Error: Summing elements gives wrong output

Time:07-05

If I sum through an array of 0 and 1 , I get a different result doing the same thing through numpy array. Why is that happening and what is the solution? The code is given below:

vl_2=vl_1=0
string_1="00001000100111000010001001100001000100110000100010011000010001011100001"
sb=string_1
table = bytearray.maketrans(b'01', b'\x00\x01')
X     = bytearray(sb, "ascii").translate(table)
Y=2.**(np.nonzero(X)[0] 1)#X=np.nonzero(sb)[0]
for i in range(len(sb)): 
                    vl_1 = vl_1 X[i]*2**(i 1)
for y in np.nditer(Y)  :
                    vl_2=vl_2 y

Note that I am doing the same math operation I both loop and so vl_2==vl_1 should be True, but I get False.

Edit:

  1. This problem occurred in a vectorized code, so speed is an issue, any solution given should consider that. So, the solution should be related to numpy rather than other time-consuming solution.

CodePudding user response:

First, using numpy to still use a for loop is not vectorization and will not improve performance (will be even worse).

Second, you're handling very large number, above numpy's native ctypes capacities, but native python int can handle them, so you need to specify dtype=object for numpy not to cast types.

import numpy as np

s = "00001000100111000010001001100001000100110000100010011000010001011100001"

table = bytearray.maketrans(b"01", b"\x00\x01")
X = bytearray(s, "ascii").translate(table)

# Using lists
nonzero_bits = [i for i, x in enumerate(X) if x != 0]
vl = sum(2 ** (i   1) for i in nonzero_bits)

# Using numpy
# Because large numbers, need to convert to dtype=object
# see https://stackoverflow.com/a/37272717/13636407
nonzero_bits_ = np.nonzero(X)[0]
nonzero_bits_ = np.array(nonzero_bits, dtype=object)
vl_ = np.sum(2 ** (nonzero_bits_   1))


assert np.all(nonzero_bits_ == nonzero_bits)
assert vl_ == vl

CodePudding user response:

The loop over np.nditer(Y) is using scientific notations that throws off the calculations a little bit. I changed the loop a little bit

vl_2_2 = 0
for y in np.nditer(Y):
    vl_2 = vl_2   y
    vl_2_2 = vl_2_2   int(y.item())
    print(f'{vl_2} {int(vl_2)} {vl_2_2}')

vl_2 is the original

vl_2_2 is doing the calculations after converting y to an int

In the printout I also print vl_2 as an int after the calculation.

The results are the same in both loops up to the point of the conversion to scientific notations

First loop (without duplicates):

32
544
4640
12832
29216
553504
8942112
76050976
210268704
4505236000
73224712736
622980526624
1722492154400
36906864243232
599856817664544
5103456445035040
14110655699776032
302341031851487776
4914027050278875680
23360771123988427296
60254259271407530528
134041235566245736992
2495224477001068343840

Second loop (look at the first number for the original)

32.0 32 32
544.0 544 544
4640.0 4640 4640
12832.0 12832 12832
29216.0 29216 29216
553504.0 553504 553504
8942112.0 8942112 8942112
76050976.0 76050976 76050976
210268704.0 210268704 210268704
4505236000.0 4505236000 4505236000
73224712736.0 73224712736 73224712736
622980526624.0 622980526624 622980526624
1722492154400.0 1722492154400 1722492154400
36906864243232.0 36906864243232 36906864243232
599856817664544.0 599856817664544 599856817664544
5103456445035040.0 5103456445035040 5103456445035040
1.4110655699776032e 16 14110655699776032 14110655699776032
3.0234103185148774e 17 302341031851487744 302341031851487776
4.914027050278875e 18 4914027050278875136 4914027050278875680
2.3360771123988427e 19 23360771123988426752 23360771123988427296
6.025425927140753e 19 60254259271407534080 60254259271407530528
1.3404123556624574e 20 134041235566245740544 134041235566245736992
2.4952244770010683e 21 2495224477001068314624 2495224477001068343840

CodePudding user response:

With your setup - I like to see some values, not just a vague "not the same" claim.

In [70]: Y
Out[70]: 
array([3.20000000e 01, 5.12000000e 02, 4.09600000e 03, 8.19200000e 03,
       1.63840000e 04, 5.24288000e 05, 8.38860800e 06, 6.71088640e 07,
       1.34217728e 08, 4.29496730e 09, 6.87194767e 10, 5.49755814e 11,
       1.09951163e 12, 3.51843721e 13, 5.62949953e 14, 4.50359963e 15,
       9.00719925e 15, 2.88230376e 17, 4.61168602e 18, 1.84467441e 19,
       3.68934881e 19, 7.37869763e 19, 2.36118324e 21])


In [72]: X
Out[72]: bytearray(b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x01\x01\x01\x00\x00\x00\x00\x01')

In [73]: for i in range(len(sb)): 
    ...:                     vl_1 = vl_1 X[i]*2**(i 1)
    ...:                     

In [74]: vl_1
Out[74]: 2495224477001068343840


In [76]: for y in np.nditer(Y)  :
    ...:                     vl_2=vl_2 y
    ...:                     

In [77]: vl_2
Out[77]: 2.4952244770010683e 21

One is float (after all Y is float), but otherwise the values are the same (within float precision)

In [78]: vl_1-vl_2
Out[78]: 0.0

nditer does nothing for you:

In [79]: vl_2=0
    ...: for y in Y  : vl_2=vl_2 y

In [80]: vl_2
Out[80]: 2.4952244770010683e 21

but iterating on arrays is slower. You don't need it

In [81]: np.sum(Y)
Out[81]: 2.4952244770010683e 21

edit

If you replace 2. with 2 when constructing Y:

In [95]: 2.**(np.nonzero(X)[0] 1)
Out[95]: 
array([3.20000000e 01, 5.12000000e 02, 4.09600000e 03, 8.19200000e 03,
       1.63840000e 04, 5.24288000e 05, 8.38860800e 06, 6.71088640e 07,
       1.34217728e 08, 4.29496730e 09, 6.87194767e 10, 5.49755814e 11,
       1.09951163e 12, 3.51843721e 13, 5.62949953e 14, 4.50359963e 15,
       9.00719925e 15, 2.88230376e 17, 4.61168602e 18, 1.84467441e 19,
       3.68934881e 19, 7.37869763e 19, 2.36118324e 21])

In [96]: 2**(np.nonzero(X)[0] 1)
Out[96]: 
array([                 32,                 512,                4096,
                      8192,               16384,              524288,
                   8388608,            67108864,           134217728,
                4294967296,         68719476736,        549755813888,
             1099511627776,      35184372088832,     562949953421312,
          4503599627370496,    9007199254740992,  288230376151711744,
       4611686018427387904,                   0,                   0,
                         0,                   0], dtype=int64)

The second is integer values, but the last 4 are too large for int64.

Skipping the last part of X I get the same integer result:

In [100]: sum(2**(np.nonzero(X[:-8])[0] 1))
Out[100]: 4914027050278875680

In [101]: sum([x*2**(i 1) for i,x in enumerate(X[:-8])])
Out[101]: 4914027050278875680

The other answer suggested going with object dtype. While it may work, it looses most of the speed advantages of working with numeric dtype arrays.

  • Related