I have a dictionary generated by pandas which has numpy.int64
objects instead of native int
's as keys. I need to change these to the native type, and am confused as to why the following code is not so successful:
d = {np.int64(0): None}
for k, v in d.items():
print(str(type(k))) # <class 'numpy.int64'>
k_nat = k.item()
print(str(type(k_nat))) # <class 'int'>
print(d) # {0: None}
d.update({k_nat:1})
print(d) # {0: 1}
# Therefore update using int was successful
for k, v in d.items():
print(str(type(k))) # <class 'numpy.int64'>
Can anyone explain what's going on here? From my perspective, this code contradicts itself as the update using the primitive k_nat
was successful, but in the end the key is still a numpy.int64
.
CodePudding user response:
No, this is not a bug.
This code shows that the key has not changed during the update:
import numpy as np
d = {np.int64(0): None}
for k, v in d.items():
print(str(type(k))) # <class 'numpy.int64'>
k_nat = k.item()
print(str(type(k_nat))) # <class 'int'>
print(d) # {0: None}
d.update({k_nat:1})
print(d) # {0: 1}
# Therefore update using int was successful
# But key does not change
print(type(list(d.keys())[0])) # → <class 'numpy.int64'>
for k, v in d.items():
print(str(type(k))) # <class 'numpy.int64'>
Python treats int(0)
and np.int64(0)
w.r.t. dict-access. But the original key is not changed (only the value). Note that both int(0)
and np.int64(0)
are represented as 0
in expressions like print(d)
. So they look like if they are the same. However, they are equal but not identical.
in particular we have this behavior
print(d[np.int64(0)] == d[int(0)]) # True
print(np.int64(0) == int(0)) # True
print(np.int64(0) is int(0)) # False
If you want to convert the key-type, you can use:
new_d = {int(k): v for k, v in d.items()}
print(type(list(new_d.keys())[0])) # <class 'int'>
For some classes it is indeed possible to change the type of an object without changing the id
of the object and thus it still works as the same dict-key:
class A(object):
pass
class B(object):
pass
d = {A(): None}
print(type(list(d.keys())[0])) # <class '__main__.A'>
# change type of object but not the object itself
list(d.keys())[0].__class__ = B
print(type(list(d.keys())[0])) # <class '__main__.B'>
However, for some other classes (including np.int64
) this is not possible:
x = np.int64(0)
try:
x.__class__ = int
except TypeError as err:
print(err) # __class__ assignment only supported for heap types or ModuleType subclasses
CodePudding user response:
Both 0
and np.int64(0)
hash to the same value:
print(hash(0))
print(np.int64(0))
Output:
0
0
So your dictionary did not actually replace the keys data type, you can achieve the behavior you want using a simple dict comprehension (modifying an iterable while looping over it can be a bad idea in any case)
import numpy as np
d = {np.int64(0): None}
for k, v in d.items():
print(str(type(k))) # <class 'numpy.int64'>
d = {int(k):v for k,v in d.items()}
print(d)
for k, v in d.items():
print(str(type(k)))
Depending on how you actually arived at your dictionary though, you might be better of simply changing the dtype of your pandas series/dataframe