Finding and Replacing Objects in Python

Today, we’re going to demonstrate a fairly evil thing in Python, which I call object replacement.

Say you have some program that’s been running for a while, and a particular object has made its way throughout your code. It lives inside lists, class attributes, maybe even inside some closures. You want to completely replace this object with another one; that is to say, you want to find all references to object A and replace them with object B, enabling A to be garbage collected. This has some interesting implications for special object types. If you have methods that are bound to A, you want to rebind them to B. If A is a class, you want all instances of A to become instances of B. And so on.

But why on Earth would you want to do that? you ask. I’ll focus on a concrete use case in a future post, but for now, I imagine this could be useful in some kind of advanced unit testing situation with mock objects. Still, it’s fairly insane, so let’s leave it primarily as an intellectual exercise.

This article is written for CPython 2.7.[1]


First, a recap on terminology here. You can skip this section if you know Python well.

In Python, names are what most languages call “variables”. They reference objects. So when we do:

a = [1, 2, 3, 4]

…we are creating a list object with four integers, and binding it to the name a. In graph form:[2]

L[1, 2, 3, 4]aaa->L

In each of the following examples, we are creating new references to the list object, but we are never duplicating it. Each reference points to the same memory address (which you can get using id(a)).

b = a
c = SomeContainerClass() = a
def wrapper(L):
    def inner():
        return L.pop()
    return inner

d = wrapper(a)
cluster0dobj[1, 2, 3, 4]aaa->objbbb->objcc.datac->objLLL->obj

Note that these references are all equal. a is no more valid a name for the list than b,, or L (or d.func_closure[0].cell_contents to the outside world). As a result, if you delete one of these references—explicitly with del a, or implicitly if a name goes out of scope—then the other references are still around, and object continues to exist. If all of an object’s references disappear, then Python’s garbage collector should eliminate it.

Dead ends

My first thought when approaching this problem was to physically write over the memory where our target object is stored. This can be done using ctypes.memmove() from the Python standard library:

>>> class A(object): pass
>>> class B(object): pass
>>> obj = A()
>>> print obj
<__main__.A object at 0x10e3e1190>
>>> import ctypes
>>> ctypes.memmove(id(A), id(B), object.__sizeof__(A))
>>> print obj
<__main__.B object at 0x10e3e1190>

What we are doing here is overwriting the fields of the A instance of the PyClassObject C struct with fields from the B struct instance. As a result, they now share various properties, such as their attribute dictionaries (__dict__). So, we can do things like this:

>>> = 123

However, there are clear issues. What we’ve done is create a shallow copy. Therefore, A and B are still distinct objects, so certain changes made to one will not be replicated to the other:

>>> A is B
>>> B.__name__ = "C"
>>> A.__name__

Also, this won’t work if A and B are different sizes, since we will be either reading from or writing to memory that we don’t necessarily own:

>>> A = ()
>>> B = []
>>> print A.__sizeof__(), B.__sizeof__()
24 40
>>> import ctypes
>>> ctypes.memmove(id(A), id(B), A.__sizeof__())
Python(33575,0x7fff76925300) malloc: *** error for object 0x6f: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Oh, and there’s a bit of a problem when we deallocate these objects, too…

>>> A = []
>>> B = range(8)
>>> import ctypes
>>> ctypes.memmove(id(A), id(B), A.__sizeof__())
>>> print A
[0, 1, 2, 3, 4, 5, 6, 7]
>>> del A
>>> del B
Segmentation fault: 11

Fishing for references with Guppy

A more appropriate solution is finding all of the references to the old object, and then updating them to point to the new object, rather than replacing the old object directly.

But how do we track references? Fortunately, there’s a library called Guppy that allows us to do this. Often used for diagnosing memory leaks, we can take advantage of its robust object tracking features here. Install it with pip (pip install guppy).

I’ve always found Guppy hard to use (as many debuggers are, though justified by the complexity of the task involved), so we’ll begin with a feature demo before delving into the actual problem.

Feature demonstration

Guppy’s interface is deceptively simple. We begin by calling guppy.hpy(), to expose the Heapy interface, which is the component of Guppy with the features we want:

>>> import guppy
>>> hp = guppy.hpy()
>>> hp
Top level interface to Heapy.
Use eg: hp.doc for more info on hp.

Calling hp.heap() shows us a table of the objects known to Guppy, grouped together (mathematically speaking, partitioned) by type[3] and sorted by how much space they take up in memory:

>>> heap = hp.heap()
>>> heap
Partition of a set of 45761 objects. Total size = 4699200 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  15547  34  1494736  32   1494736  32 str
     1   8356  18   770272  16   2265008  48 tuple
     2    346   1   452080  10   2717088  58 dict (no owner)
     3  13685  30   328440   7   3045528  65 int
     4     71   0   221096   5   3266624  70 dict of module
     5   1652   4   211456   4   3478080  74 types.CodeType
     6    199   0   210856   4   3688936  79 dict of type
     7   1614   4   193680   4   3882616  83 function
     8    199   0   177008   4   4059624  86 type
     9    124   0   135328   3   4194952  89 dict of class
<91 more rows. Type e.g. '_.more' to view.>

This object (called an IdentitySet) looks bizarre, but it can be treated roughly like a list. If we want to take a look at strings, we can do heap[0]:

>>> heap[0]
Partition of a set of 22606 objects. Total size = 2049896 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  22606 100  2049896 100   2049896 100 str

This isn’t very useful, though. What we really want to do is re-partition this subset using another relationship. There are a number of options, such as:

>>> heap[0].byid  # Group by object ID; each subset therefore has one element
Set of 22606 <str> objects. Total size = 2049896 bytes.
 Index     Size   %   Cumulative  %   Representation (limited)
     0     7480   0.4      7480   0.4 'The class Bi... copy of S.\n'
     1     4872   0.2     12352   0.6 "Support for ... 'error'.\n\n"
     2     4760   0.2     17112   0.8 'Heap queues\ Art! :-)\n'
     3     4760   0.2     21872   1.1 'Heap queues\ Art! :-)\n'
     4     3896   0.2     25768   1.3 'This module function\n'
     5     3824   0.2     29592   1.4 'The type of order.\n'
     6     3088   0.2     32680   1.6 't\x00\x00|\x...x00|\x02\x00S'
     7     2992   0.1     35672   1.7 'HeapView(roo... size, etc.\n'
     8     2808   0.1     38480   1.9 'Directory tr...ories\n\n    '
     9     2640   0.1     41120   2.0 'The class No... otherwise.\n'
<22596 more rows. Type e.g. '_.more' to view.>
>>> heap[0].byrcs  # Group by what types of objects reference the strings
Partition of a set of 22606 objects. Total size = 2049896 bytes.
 Index  Count   %     Size   % Cumulative  % Referrers by Kind (class / dict of class)
     0   6146  27   610752  30    610752  30 types.CodeType
     1   5304  23   563984  28   1174736  57 tuple
     2   4104  18   237536  12   1412272  69 dict (no owner)
     3   1959   9   139880   7   1552152  76 list
     4    564   2   136080   7   1688232  82 function, tuple
     5    809   4    97896   5   1786128  87 dict of module
     6    346   2    71760   4   1857888  91 dict of type
     7    365   2    19408   1   1877296  92 dict of module, tuple
     8    192   1    16176   1   1893472  92 dict (no owner), list
     9    232   1    11784   1   1905256  93 dict of class, function, tuple, types.CodeType
<229 more rows. Type e.g. '_.more' to view.>
>>> heap[0].byvia  # Group by how the strings are related to their referrers
Partition of a set of 22606 objects. Total size = 2049896 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0   2656  12   420456  21    420456  21 '[0]'
     1   2095   9   259008  13    679464  33 '.co_code'
     2   2095   9   249912  12    929376  45 '.co_filename'
     3    564   2   136080   7   1065456  52 '.func_doc', '[0]'
     4    243   1   103528   5   1168984  57 "['__doc__']"
     5   1930   9   100584   5   1269568  62 '.co_lnotab'
     6    502   2    31128   2   1300696  63 '[1]'
     7    306   1    16272   1   1316968  64 '[2]'
     8    242   1    12960   1   1329928  65 '[3]'
     9    184   1     9872   0   1339800  65 '[4]'
<7323 more rows. Type e.g. '_.more' to view.>

From this, we can see that the plurality of memory devoted to strings is taken up by those referenced by code objects (types.CodeType represents Python code—accessible from a non-C-defined function through func.func_code—and contains things like the names of its local variables and the actual sequence of opcodes that make it up).

For fun, let’s pick a random string.

>>> import random
>>> obj = heap[0].byid[random.randrange(0, heap[0].count)]
>>> obj
Set of 1 <str> object. Total size = 176 bytes.
 Index     Size   %   Cumulative  %   Representation (limited)
     0      176 100.0       176 100.0 'Define names...not listed.\n'

Interesting. Since this heap subset contains only one element, we can use .theone to get the actual object represented here:

>>> obj.theone
'Define names for all type symbols known in the standard interpreter.\n\nTypes that are part of optional modules (e.g. array) are not listed.\n'

Looks like the docstring for the types module. We can confirm by using .referrers to get the set of objects that refer to objects in the given set:

>>> obj.referrers
Partition of a set of 1 object. Total size = 3352 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100     3352 100      3352 100 dict of module

This is types.__dict__ (since the docstring we got is actually stored as types.__dict__["__doc__"]), so if we use .referrers again:

>>> obj.referrers.referrers
Partition of a set of 1 object. Total size = 56 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100       56 100        56 100 module
>>> obj.referrers.referrers.theone
<module 'types' from '/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/types.pyc'>
>>> import types
>>> types.__doc__ is obj.theone

But why did we find an object in the types module if we never imported it? Well, let’s see. We can use hp.iso() to get the Heapy set consisting of a single given object:

>>> hp.iso(types)
Partition of a set of 1 object. Total size = 56 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100       56 100        56 100 module

Using a similar procedure as before, we see that types is imported by the traceback module:

>>> hp.iso(types).referrers
Partition of a set of 10 objects. Total size = 25632 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      2  20    13616  53     13616  53 dict (no owner)
     1      5  50     9848  38     23464  92 dict of module
     2      1  10     1048   4     24512  96 dict of guppy.etc.Glue.Interface
     3      1  10     1048   4     25560 100 dict of guppy.etc.Glue.Share
     4      1  10       72   0     25632 100 tuple
>>> hp.iso(types).referrers[1].byid
Set of 5 <dict of module> objects. Total size = 9848 bytes.
 Index     Size   %   Cumulative  %   Owner Name
     0     3352  34.0      3352  34.0 traceback
     1     3352  34.0      6704  68.1 warnings
     2     1048  10.6      7752  78.7 __main__
     3     1048  10.6      8800  89.4 abc
     4     1048  10.6      9848 100.0 guppy.etc.Glue

…and that is imported by site:

>>> import traceback
>>> hp.iso(traceback).referrers
Partition of a set of 3 objects. Total size = 15992 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  33    12568  79     12568  79 dict (no owner)
     1      1  33     3352  21     15920 100 dict of module
     2      1  33       72   0     15992 100 tuple
>>> hp.iso(traceback).referrers[1].byid
Set of 1 <dict of module> object. Total size = 3352 bytes.
 Index     Size   %   Cumulative  %   Owner Name
     0     3352 100.0      3352 100.0 site

Since site is imported by Python on startup, we’ve figured out why objects from types exist, even though we’ve never used them.

We’ve learned something important, too. When objects are stored as ordinary attributes of a parent object (like types.__doc__, traceback.types, and site.traceback from above), they are not referenced directly by the parent object, but by that object’s __dict__ attribute. Therefore, if we want to replace A with B and A is an attribute of C, we (probably) don’t need to know anything special about C—just how to modify dictionaries.

A good Guppy/Heapy tutorial, while a bit old and incomplete, can be found on Andrey Smirnov’s website.

Examining paths

Let’s set up an example replacement using class instances:

class A(object):

class B(object):

a = A()
b = B()

Suppose we want to replace a with b. From the demo above, we know that we can get the Heapy set of a single object using hp.iso(). We also know we can use .referrers to get the set of objects that reference the given object:

>>> import guppy
>>> hp = guppy.hpy()
>>> print hp.iso(a).referrers
Partition of a set of 1 object. Total size = 1048 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100     1048 100      1048 100 dict of module

a is only referenced by one object, which makes sense, since we’ve only used it in one place—as a local variable—meaning hp.iso(a).referrers.theone must be locals():

>>> hp.iso(a).referrers.theone is locals()

However, there is a more useful feature available to us: .pathsin. This also returns references to the given object, but instead of a Heapy set, it is a list of Path objects. These are more useful since they tell us not only what objects are related to the given object, but how they are related.

>>> print hp.iso(a).pathsin
 0: Src['a']

This looks very ambiguous. However, we find that we can extract the source of the reference using .src:

>>> path = hp.iso(a).pathsin[0]
>>> print path.src
Partition of a set of 1 object. Total size = 1048 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100     1048 100      1048 100 dict of module
>>> path.src.theone is locals()

…and, we can examine the type of relation by looking at .path[1] (the actual reason for this isn’t worth getting into, due to Guppy’s lack of documentation on the subject):

>>> relation = path.path[1]
>>> relation
<guppy.heapy.Path.Based_R_INDEXVAL object at 0x100f38230>

We notice that relation is a Based_R_INDEXVAL object. Sounds bizarre, but this tells us that a is a particular indexed value of path.src. What index? We can get this using relation.r:

>>> rel = relation.r
>>> print rel

Ah ha! So now we know that a is equal to the reference source (i.e., path.src.theone) indexed by rel:

>>> path.src.theone[rel] is a

But path.src.theone is just a dictionary, meaning we know how to modify it very easily:[4]

>>> path.src.theone[rel] = b
>>> a
<__main__.B object at 0x100dae090>
>>> a is b

Bingo. We’ve successfully replaced a with b, using a general method that should work for any case where a is in a dictionary-like object.

Handling different reference types

We’ll continue by wrapping this code up in a nice function, which we will expand as we go:

import guppy
from guppy.heapy import Path

hp = guppy.hpy()

def replace(old, new):
    for path in hp.iso(old).pathsin:
        relation = path.path[1]
        if isinstance(relation, Path.R_INDEXVAL):
            path.src.theone[relation.r] = new

Dictionaries, lists, and tuples

As noted above, this is versatile to handle many dictionary-like situations, including __dict__, which means we already know how to replace object attributes:

>>> a, b = A(), B()
>>> class X(object):
...     pass
>>> X.cattr = a
>>> x = X()
>>> x.iattr = a
>>> d1 = {1: a}
>>> d2 = [{1: {0: ("foo", "bar", {"a": a, "b": b})}}]
>>> replace(a, b)
>>> print a
<__main__.B object at 0x1042b9910>
>>> print X.cattr
<__main__.B object at 0x1042b9910>
>>> print x.iattr
<__main__.B object at 0x1042b9910>
>>> print d1[1]
<__main__.B object at 0x1042b9910>
>>> print d2[0][1][0][2]["a"]
<__main__.B object at 0x1042b9910>

Lists can be handled exactly the same as dictionaries, although the keys in this case (i.e., relation.r) will always be integers.

>>> a, b = A(), B()
>>> L = [0, 1, 2, a, b]
>>> print L
[0, 1, 2, <__main__.A object at 0x104598950>, <__main__.B object at 0x104598910>]
>>> replace(a, b)
>>> print L
[0, 1, 2, <__main__.B object at 0x104598910>, <__main__.B object at 0x104598910>]

Tuples are interesting. We can’t modify them directly because they’re immutable, but we can create a new tuple with the new value, and then replace that tuple just like we replaced our original object:

        # Meanwhile, in replace()...
        if isinstance(relation, Path.R_INDEXVAL):
            source = path.src.theone
            if isinstance(source, tuple):
                temp = list(source)
                temp[relation.r] = new
                replace(source, tuple(temp))
                source[relation.r] = new

As a result:

>>> a, b = A(), B()
>>> t1 = (0, 1, 2, a)
>>> t2 = (0, (1, (2, (3, (4, (5, (a,)))))))
>>> replace(a, b)
>>> print t1
(0, 1, 2, <__main__.B object at 0x104598e50>)
>>> print t2
(0, (1, (2, (3, (4, (5, (<__main__.B object at 0x104598e50>,)))))))

Bound methods

Here’s a fun one. Let’s upgrade our definitions of A and B:

class A(object):
    def func(self):
        return self

class B(object):

After replacing a with b, a.func no longer exists, as we’d expect:

>>> a, b = A(), B()
>>> a.func()
<__main__.A object at 0x10c4a5b10>
>>> replace(a, b)
>>> a.func()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'B' object has no attribute 'func'

But what if we save a reference to a.func before the replacement?

>>> a, b = A(), B()
>>> f = a.func
>>> replace(a, b)
>>> f()
<__main__.A object at 0x10c4b6090>

Hmm. So f has kept a reference to a somehow, but not in a dictionary-like object. So where is it?

Well, we can reveal it with the attribute f.__self__:

>>> f.__self__
<__main__.A object at 0x10c4b6090>

Unfortunately, this attribute is magical and we can’t write to it directly:

>>> f.__self__ = b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: readonly attribute

Python clearly doesn’t want us to re-bind bound methods, and a reasonable person would give up here, but we still have a few tricks up our sleeve. Let’s examine the internal C structure of bound methods, PyMethodObject:

clusterPyMethodObjectobj<__main__.A object at 0xdeadbeef>structstruct _object* _ob_nextstruct _object* _ob_prevPy_ssize_t ob_refcntstruct _typeobject* ob_typePyObject* im_funcPyObject* im_selfPyObject* im_classPyObject* im_weakrefliststruct:f->obj

The four gray fields of the struct come from PyObject_HEAD, which exist in all Python objects. The first two fields are from _PyObject_HEAD_EXTRA, and only exist when the debugging macro Py_TRACE_REFS is defined, in order to support more advanced reference counting. We can see that the im_self field, which mantains the reference to our target object, is either forth or sixth in the struct depending on Py_TRACE_REFS. If we can figure out the size of the field and its offset from the start of the struct, then we can set its value directly using ctypes.memmove():

ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size)

Here, id(f) is the memory location of our method, which refers to the start of the C struct from above. offset is the number of bytes between this memory location and the start of the im_self field. We use ctypes.byref() to create a reference to the replacement object, b, which will be copied over the existing reference to a. Finally, field_size is the number of bytes we’re copying, equal to the size of the im_self field.

Well, all but one of these fields are pointers to structure types, meaning they have the same size,[5] equal to ctypes.sizeof(ctypes.py_object). This is (probably) 4 or 8 bytes, depending on whether you’re on a 32-bit or a 64-bit system. The other field is a Py_ssize_t object—possibly the same size as the pointers, but we can’t be sure—which is equal to ctypes.sizeof(ctypes.c_ssize_t).

We know that field_size must be ctypes.sizeof(ctypes.py_object), since we are copying a structure pointer. offset is this value multiplied by the number of structure pointers before im_self (4 if Py_TRACE_REFS is defined and 2 otherwise), plus ctypes.sizeof(ctypes.c_ssize_t) for ob_type. But how do we determine if Py_TRACE_REFS is defined? We can’t check the value of a macro at runtime, but we can check for the existence of sys.getobjects(), which is only defined when that macro is. Therefore, we can make our replacement like so:

>>> import ctypes
>>> import sys
>>> field_size = ctypes.sizeof(ctypes.py_object)
>>> ptrs_in_struct = 4 if hasattr(sys, "getobjects") else 2
>>> offset = ptrs_in_struct * field_size + ctypes.sizeof(ctypes.c_ssize_t)
>>> ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size)
>>> f.__self__ is b
>>> f()
<__main__.B object at 0x10a8af290>

Excellent—it worked!

There’s another kind of bound method, which is the built-in variety as opposed to the user-defined variety we saw above. An example is a.__sizeof__():

>>> a, b = A(), B()
>>> f = a.__sizeof__
>>> f
<built-in method __sizeof__ of A object at 0x10ab44b50>
>>> replace(a, b)
>>> f.__self__
<__main__.A object at 0x10ab44b50>

This is stored internally as a PyCFunctionObject. Let’s take a look at its layout:

clusterPyCFunctionObjectobj<__main__.A object at 0xdeadbeef>structstruct _object* _ob_nextstruct _object* _ob_prevPy_ssize_t ob_refcntstruct _typeobject* ob_typePyMethodDef* m_mlPyObject* m_selfPyObject* m_modulestruct:f->obj

Fortunately, m_self here has the same offset as im_self from before, so we can just use the same code:

>>> ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size)
>>> f.__self__ is b
>>> f
<built-in method __sizeof__ of B object at 0x10ab4f150>

Dictionary keys

Dictionary keys have a different reference relation type than values, but the replacement works mostly the same way. We pop the value of the old key from the dictionary, and then insert it in again under the new key. Here’s the code, which we’ll stick into the main block in replace():

elif isinstance(relation, Path.R_INDEXKEY):
    source = path.src.theone
    source[new] = source.pop(source.keys()[relation.r])

And, a demonstration:

>>> a, b = A(), B()
>>> d = {a: 1}
>>> replace(a, b)
>>> d
{<__main__.B object at 0x10fb47950>: 1}

Closure cells

We’ll cover just one more case, this time involving a closure. Here’s our test function:

def wrapper(obj):
    def inner():
        return obj
    return inner

As we can see, an instance of the inner function keeps references to the locals of the wrapper function, even after using our current version of replace():

>>> a, b = A(), B()
>>> f = wrapper(a)
>>> f()
<__main__.A object at 0x109446090>
>>> replace(a, b)
>>> f()
<__main__.A object at 0x109446090>

Internally, CPython implements this using things called cells. We notice that f.func_closure gives us a tuple of cell objects, and we can examine an individual cell’s contents with .cell_contents:

>>> f.func_closure
(<cell at 0x10ad9f478: instance object at 0x109446090>,)
>>> f.func_closure[0].cell_contents
<__main__.A object at 0x109446090>

As expected, we can’t just modify it…

>>> f.func_closure[0].cell_contents = b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: attribute 'cell_contents' of 'cell' objects is not writable

…because that would be too easy. So, how can we replace it? Well, we could go back to memmove, but there’s an easier way thanks to the ctypes module also exposing Python’s C API. Specifically, the PyCell_Set function (which seems to lack a pure Python equivalent) does exactly what we want. Since the function expects PyObject*s as arguments, we’ll need to use ctypes.py_object as a wrapper. Here it is:

>>> from ctypes import py_object, pythonapi
>>> pythonapi.PyCell_Set(py_object(f.func_closure[0]), py_object(b))
>>> f()
<__main__.B object at 0x10ad94dd0>

Perfect – the replacement worked. To tie it together with replace(), we’ll note that Guppy represents the cell contents relationship with Based_R_INTERATTR, for what I assume to be “internal attribute”. We can use this to find the cell object within the inner function that references our target object, and then use the method above to make the change:

elif isinstance(relation, Path.R_INTERATTR):
    if isinstance(source, CellType):
        pythonapi.PyCell_Set(py_object(source), py_object(new))

Other cases

There are many, many more types of possible replacements. I’ve written a more extensible version of replace() with some test cases, which can be viewed on Gist here.

Certainly, not every case is handled by it, but it seems to cover the majority that I’ve found through testing. There are a number of reference relations in Guppy that I couldn’t figure out how to replicate without doing something insane (R_HASATTR, R_CELL, and R_STACK), so some obscure replacements are likely unimplemented.

Some other kinds of replacements are known, but impossible. For example, replacing a class object that uses __slots__ with another class will not work if the replacement class has a different slot layout and instances of the old class exist. More generally, replacing a class with a non-class object won’t work if instances of the class exist. Furthermore, references stored in data structures managed by C extensions cannot be changed, since there’s no good way for us to track these.


  1. ^ This post relies heavily on implementation details of CPython 2.7. While it could be adapted for Python 3 by examining changes to the internal structures of objects that we used above, that would be a lost cause if you wanted to replicate this on Jython or some other implementation. We are so dependent on concepts specific to CPython that you would need to start from scratch, beginning with a language-specific replacement for Guppy.

  2. ^ The DOT files used to generate graphs in this post are available on Gist.

  3. ^ They’re actually grouped together by clodo (“class or dict object”), which is similar to type, but groups __dict__s separately by their owner’s type.

  4. ^ Python’s documentation tells us not to modify the locals dictionary, but screw that; we’re gonna do it anyway.

  5. ^ According to the C99 and C11 standards; section in the former and in the latter: “All pointers to structure types shall have the same representation and alignment requirements as each other.”