Home > Back-end >  Different cloudpickle everytime the program is run
Different cloudpickle everytime the program is run

Time:10-22

Consider the following python code:

import cloudpickle


class Foo:
    def __init__(self, num):
        self.num = num


def outer(num):
    return Foo(num)


print(cloudpickle.dumps(outer))

This produces a different pickle everytime you run the code. Analysing the pickle file using pickletools shows the following diff:

144c144
<   552: \x8c         SHORT_BINUNICODE '2e3db4572bb349268962a75a8a6f034c'
---
>   552: \x8c         SHORT_BINUNICODE '89ee770de9b745c4bbe83c353f1debba'

Now, I understand that cloudpickle doesn't guarantee determinism of the pickle files. (link), but I am curious why these two pickle files are different. It looks like the difference above is because of some sort of different hash for the Foo class.

Note that I ran the python program with a fixed PYTHONHASHSEED.

PS: This is enough to reproduce the issue:

import pickletools
import cloudpickle


class Foo:
    def __init__(self, num):
        self.num = num


pickletools.dis(cloudpickle.dumps(Foo))

So it seems that each class has a property which gets baked into the cloudpickle, but I don't know what that property is.

CodePudding user response:

Curious!

I dug into the source code and found that it's not a property of the class nor even a computed hash, it's just a random identifier generated with uuid4() per class.

That function gets called by _class_getnewargs() here, which is called by _dynamic_class_reduce() here, which has the comment

Save a class that can't be stored as module global. This method is used to serialize classes that are defined inside functions, or that otherwise can't be serialized as attribute lookups from global modules.

Things are immediately much less complicated if the class is not in the __main__ module (since __main__ could be anything from the eventual unpickler's perspective); if you do from b import outer and cloudpickle that outer, you get

    0: \x80 PROTO      5
    2: \x95 FRAME      15
   11: \x8c SHORT_BINUNICODE 'b'
   14: \x94 MEMOIZE    (as 0)
   15: \x8c SHORT_BINUNICODE 'outer'
   22: \x94 MEMOIZE    (as 1)
   23: \x93 STACK_GLOBAL
   24: \x94 MEMOIZE    (as 2)
   25: .    STOP

as the pickle instead of all the voodoo cloudpickle does to pickle something that is in __main__.

  • Related