Consider the following python code:
import cloudpickle
class Foo:
def __init__(self, num):
self.num = num
def outer(num):
return Foo(num)
print(cloudpickle.dumps(outer))
This produces a different pickle everytime you run the code. Analysing the pickle file using pickletools
shows the following diff:
144c144
< 552: \x8c SHORT_BINUNICODE '2e3db4572bb349268962a75a8a6f034c'
---
> 552: \x8c SHORT_BINUNICODE '89ee770de9b745c4bbe83c353f1debba'
Now, I understand that cloudpickle doesn't guarantee determinism of the pickle files. (link), but I am curious why these two pickle files are different. It looks like the difference above is because of some sort of different hash for the Foo
class.
Note that I ran the python program with a fixed PYTHONHASHSEED
.
PS: This is enough to reproduce the issue:
import pickletools
import cloudpickle
class Foo:
def __init__(self, num):
self.num = num
pickletools.dis(cloudpickle.dumps(Foo))
So it seems that each class has a property which gets baked into the cloudpickle, but I don't know what that property is.
CodePudding user response:
Curious!
I dug into the source code and found that it's not a property of the class nor even a computed hash, it's just a random identifier generated with uuid4()
per class.
That function gets called by _class_getnewargs()
here, which is called by _dynamic_class_reduce()
here, which has the comment
Save a class that can't be stored as module global. This method is used to serialize classes that are defined inside functions, or that otherwise can't be serialized as attribute lookups from global modules.
Things are immediately much less complicated if the class is not in the __main__
module (since __main__
could be anything from the eventual unpickler's perspective); if you do from b import outer
and cloudpickle that outer
, you get
0: \x80 PROTO 5
2: \x95 FRAME 15
11: \x8c SHORT_BINUNICODE 'b'
14: \x94 MEMOIZE (as 0)
15: \x8c SHORT_BINUNICODE 'outer'
22: \x94 MEMOIZE (as 1)
23: \x93 STACK_GLOBAL
24: \x94 MEMOIZE (as 2)
25: . STOP
as the pickle instead of all the voodoo cloudpickle does to pickle something that is in __main__
.