Home > OS >  Is this a bug? Cannot simply change dict keys from numpy to primitive data types
Is this a bug? Cannot simply change dict keys from numpy to primitive data types

Time:03-10

I have a dictionary generated by pandas which has numpy.int64 objects instead of native int's as keys. I need to change these to the native type, and am confused as to why the following code is not so successful:

d = {np.int64(0): None}

for k, v in d.items():
    print(str(type(k)))     # <class 'numpy.int64'>
    k_nat = k.item()
    print(str(type(k_nat))) # <class 'int'>
    print(d)                # {0: None}
    d.update({k_nat:1})
    print(d)                # {0: 1}
                            # Therefore update using int was successful

for k, v in d.items():
    print(str(type(k)))     # <class 'numpy.int64'>

Can anyone explain what's going on here? From my perspective, this code contradicts itself as the update using the primitive k_nat was successful, but in the end the key is still a numpy.int64.

CodePudding user response:

No, this is not a bug.

This code shows that the key has not changed during the update:

import numpy as np
d = {np.int64(0): None}

for k, v in d.items():
    print(str(type(k)))     # <class 'numpy.int64'>
    k_nat = k.item()
    print(str(type(k_nat))) # <class 'int'>
    print(d)                # {0: None}
    d.update({k_nat:1})
    print(d)                # {0: 1}
                            # Therefore update using int was successful
                            # But key does not change
    print(type(list(d.keys())[0])) # → <class 'numpy.int64'>

for k, v in d.items():
    print(str(type(k)))     # <class 'numpy.int64'>

Python treats int(0) and np.int64(0) w.r.t. dict-access. But the original key is not changed (only the value). Note that both int(0) and np.int64(0) are represented as 0 in expressions like print(d). So they look like if they are the same. However, they are equal but not identical.

in particular we have this behavior

print(d[np.int64(0)] == d[int(0)]) # True
print(np.int64(0) == int(0)) # True
print(np.int64(0) is int(0)) # False

If you want to convert the key-type, you can use:

new_d = {int(k): v for k, v in d.items()}
print(type(list(new_d.keys())[0])) # <class 'int'>

For some classes it is indeed possible to change the type of an object without changing the id of the object and thus it still works as the same dict-key:

class A(object):
    pass

class B(object):
    pass

d = {A(): None}

print(type(list(d.keys())[0])) # <class '__main__.A'>

# change type of object but not the object itself
list(d.keys())[0].__class__ = B
print(type(list(d.keys())[0])) # <class '__main__.B'>

However, for some other classes (including np.int64) this is not possible:

x = np.int64(0)
try: 
    x.__class__ = int
except TypeError as err:
    print(err) # __class__ assignment only supported for heap types or ModuleType subclasses

CodePudding user response:

Both 0 and np.int64(0) hash to the same value:

print(hash(0))
print(np.int64(0))

Output:

0
0

So your dictionary did not actually replace the keys data type, you can achieve the behavior you want using a simple dict comprehension (modifying an iterable while looping over it can be a bad idea in any case)

import numpy as np
d = {np.int64(0): None}

for k, v in d.items():
    print(str(type(k)))     # <class 'numpy.int64'>

d = {int(k):v for k,v in d.items()}
print(d)
for k, v in d.items():
    print(str(type(k)))

Depending on how you actually arived at your dictionary though, you might be better of simply changing the dtype of your pandas series/dataframe

  • Related