There is also automatically generated documentation.
See INSTALL, or possibly README.
It can record information about all reachable objects at some point in time, and has functions to analyse that information.
It knows about all built-in types, but none that are defined in extension modules.
We'll use this code to go through a few of the features of the profiler:
>>> foo = {} >>> for i in range(1000): ... foo[i] = range(100) >>> while True: ... pass
It's a totally useless piece of code, but never mind. It can go in
example.py
.
You need to tell the profiler when to scan memory for objects.
A good place to do it here is before the while True
loop,
so before while True
, put:
>>> import code >>> from sizer import scanner >>> objs = scanner.Objects() >>> code.interact(local = {'objs': objs})
That will do the scanning and then bring up an interpreter console to play around in.
Let's find out what the biggest single objects here are:
>>> from sizer import formatting >>> formatting.printsizes(objs, count = 10)
which should give something like:
Size Total Object 12188 12188 dict at -0x4830bf0c 3716 15904 dict at -0x48312994 3410 19314 str at 0x814f898 2672 21986 dict at -0x482a21c4 2641 24627 str at 0x8149808 2514 27141 str at 0x8151280 2429 29570 str at 0x8168528 2163 31733 str at 0x81fdf28 2048 33781 dict at -0x482a20fc 1796 35577 dict at -0x4828cd7c
printsizes
has three useful keyword parameters,
sorted
, threshold
and count
-
the automatic documentation has details of them (at the link above).
You can look at individual objects from this list, by looking them up
in objs
, which will give you a wrapper for the real object.
For example, objs[-0x4830bf0c]
is a wrapper for the
dictionary object at the top of the list.
>>> w = objs[-0x4830bf0c]
Now we can look at what object that represents:
>>> w.obj
which prints out a very long dictionary (which is in fact example.foo).
The wrapper for an object obj
will be in objs[id(obj)]
.
For example, objs[id(sys)]
is the wrapper for the sys
module.
Other interesting fields of wrappers are:
w.size
- the number of bytes of memory used by the object,
in this case 12188.w.type
- the class of the object.w.children
- a list of wrappers to objects which
this object references.Dictionary wrappers, such as w, have a couple of extra fields,
w.keys
and w.values
, which are lists of wrappers
of the dictionary's keys and values, respectively.
The wrappers are of type wrapper.ObjectWrapper.
At the moment, it's hard to see where a given object can be found. We can fix that like so:
>>> from sizer import annotate >>> annotate.markparents(objs)
Now each wrapper will have a parents
field:
>>> w.parents [wrap dict at -0x481f14e4] >>> w.parents[0].parents [wrap module __main__, wrap frame at 0x8193e9c]
which shows that the dictionary is contained in some other dictionary
(which happens to be __main__.__dict__
), which is in turn
contained in both __main__
and a frame object.
You might not be interested in some objects. For example, many of the biggest objects here are docstrings. You can use the operations module to remove any objects you like:
>>> from sizer import operations >>> nostr = operations.fix(operations.filterouttype(objs, str))
Now nostr
will contain everything in objs
that
isn't a string. fix
is needed because filterouttype
doesn't repair the structure of the wrappers after removing some of them,
so otherwise some functions (particularly those in module annotate) will
fail.
Anything that can be done on objs
can now be done on
nostr.
There are other filtering operations: operations.filtersize, operations.filtertype and operations.filterd. filterd is the most general.
Again, you must use operations.fix on the returned dictionaries before you call any function from annotate on them.
You can sort objects by size using operations.sorted, operations.toplist, operations.top and operations.pos.
You can also find the total memory use for each type, using (for example):
>>> formatting.printsizesop(operations.bytype(objs), threshold = 1000)
Note: you must use printsizesop rather than printsizes here, since bytype doesn't return a dictionary of wrappers.
In this case this prints out:
Size Total Object 423860 423860 <type 'list'> 216657 640517 <type 'str'> 130592 771109 <type 'dict'> 41160 812269 <type 'type'> 33548 845817 <type 'tuple'> 19456 865273 <type 'code'> 14088 879361 <type 'int'> 13332 892693 <type 'function'> 4760 897453 <type 'builtin_function_or_method'> 2080 899533 <type 'classobj'> 1680 901213 <type 'frame'>
which is a list of all types taking up more than 1000 bytes of memory.
Incidentally, let's find out if most space is taken up by a single big list or lots of smaller ones:
# Get wrappers for only list objects. >>> lists = operations.filtertype(objs, list) # Sort and print those lists by size. >>> formatting.printsizesop(operations.bysize(lists))
This prints:
Size Total Object 420000 420000 420 1100 421100 1100 896 421996 896 756 422752 756 192 422944 32 168 423112 24 164 423276 164 116 423392 116 80 423472 40 76 423548 76 72 423620 72 72 423692 36 60 423752 20 56 423808 28 52 423860 52
Although it's not clear, the "Object" column gives the size of each
single object, and the "Size" column gives the total of all objects of
that size. So you can see that almost all the space used up by lists is
used up by lists of 420 bytes. We can get a dictionary containing only
420-byte lists with operations.filtersize(lists, 420)
.
Another function is operations.diff, which takes two set of wrappers and returns a set containing the wrappers found in the second one but not the first one.
For example, if you run the scanner at two different points in time, you can
pass the results of the two scans to diff
to get the new objects from
the second scan.
Note: at the moment, you can't use most functions of
annotate on the set that diff
returns.
Running operations.fix
will result in most things in the set being removed.
You can use the annotate.findcreators function to find out which functions created the largest and most objects.
Note: You must have a patched Python in order to use this function - it will not work at all without it.
We'll use this example, which can go in creatorex.py
:
def a(): for i in range(100): c() def b(): for i in range(200): c() def c(): global keep keep.append(range(1000)) keep = [] a() b()
Each call to a()
will make 100 lists, and each call to b()
will make 200 lists, through a call to c()
. They're appended to
keep
so that there's still a reference to them.
Now, let's get a set of objects scanned and make the profiler put some creation information together:
import creatorex from sizer import scanner, annotate, formatting objs = scanner.Objects() creators = annotate.findcreators(objs)
Now we can print out which lines of code created the most objects:
>>> formatting.printsizes(creators.back, count=9) Size Total Object 5527200 5527200 creatorex.py:11 86744 5613944 <interpreter>:0 41802 5655746 /usr/local/lib/python2.5/linecache.py:101 34156 5689902 /usr/local/lib/python2.5/encodings/__init__.py:30 24222 5714124 /usr/local/lib/python2.5/os.py:44 18756 5732880 /usr/local/lib/python2.5/site.py:61 15432 5748312 /usr/local/lib/python2.5/site-packages/sizer/scanner.py:38 7881 5767444 /usr/local/lib/python2.5/os.py:49 7613 5775057 /usr/local/lib/python2.5/site-packages/sizer/annotate.py:12
At the top you can see line 11 of creatorex.py, which is
keep.append(range(1000))
.
So that line is creating most objects (no surprise there).
Also, <interpreter>
is not a real line of code.
It represents all the objects that were created when no Python code
was running (for example, objects created very early on).
Now, we can find out what functions called c()
in order
to make those objects. The back
field has as its keys
(calling file, calling line)
tuples:
>>> fromc = creators.back[("creatorex.py", 11)] >>> formatting.printsizes(fromc.back) Size Total Object 3684800 3684800 creators.py:7 1842400 5527200 creators.py:3 0 5527200 creators.py:11
So b
is shown as making most objects.
Now let's look at the lines which called a()
, which
called c()
, which made objects, by going back from
fromc
:
>>> froma = fromc.back[("creatorex.py", 3)] >>> formatting.printsizes(froma.back) Size Total Object 1842400 1842400 creators.py:14 0 1842400 creators.py:3
So these were made from the a()
call, as you'd expect.
The second line here gives the objects created by a()
when
it was at the bottom of the stack - i.e. no other function had called it.
In this case, there's nothing.
Unfortunately, you can't reference that as
froma.back[("creators.py", 3)]
, like
any other line, at the moment. You have to use froma.back[None]
instead.
As it happens, this problem is also the reason for the dummy
<interpreter>
file. I'll try to fix it soon.
There are a couple of extra fields in things like creators
:
size
, which gives the total size of objects created by the
functions, and members
, which gives the objects themselves.
The interface for this might change - it hasn't been here for long.
Looking at single objects is not that useful. Often the individual objects are small and most space is used up by large collections of them.
For this reason there is a function annotate.groupby, which collects objects into groups.
It takes a set of wrappers, and a subset of the wrappers, as a list, dictionary or set, to use as heads of groups. It then assigns each object in the wrappers to a group and returns a set of group wrappers.
You can use the set of wrappers it returns as a normal set of object wrappers. Everything above can be done to it (if something can't, it's a bug) - the groups behave as if they were single objects.
Note: For best results, your copy of Python should be patched. You will still get results if it's not, but they might not be as accurate.
There is also a function annotate.simplegroupby, which groups objects into modules, threads and class instances, depending on three keyword parameters given to it. It works by finding all modules, threads and classes in the objects given to it and passing those as the set of objects to group by.
Here's an example piece of code:
>>> class A(object): ... def __init__(self): ... self.list = [1,2,3,4,5] >>> as = [ A() for i in range(100) ]
Before calling groupby
, the A instances and their lists
are treated separately:
>>> objs = scanner.Objects() >>> a = objs[id(as[0])] >>> a wrap A at -0x4871a034 # The size of the object is apparently 16 bytes >>> a.size 16 # The first thing here is the instance's __dict__ >>> a.children (wrap {'list': list at -0x48678c64}, wrap type at 0x81c8e44) # The second thing here is the list itself >>> a.children[0].children (wrap 'list', wrap list at -0x48678c64, wrap None, wrap None) >>> a.children[0].children[1].obj [1, 2, 3, 4, 5] >>> a.children[0].children[1].size 40
Now we'll group the objects into instances:
>>> groups = annotate.simplegroupby(objs, classes=True)
The size is more useful now:
>>> ag = groups[id(a)] >>> ag wrap A at -0x4871a034 >>> ag.size 180
A bit of an increase! The members
field will tell you
which objects were put into this group:
>>> import pprint >>> pprint.pprint(ag.members) {-1211011092: wrap list at -0x48678c64, -1211002292: wrap {'list': list at -0x48678c64}, -1210993780: wrap A at -0x4871a034}
There's the instance itself, the list and a dictionary. In fact, the dictionary is taking up most of the space:
>>> d = ag.members[-1211002292] >>> d wrap {'list': list at -0x48678c64]} >>> d.size 124
In this case, you could save space (if you were creating lots of instances of A) by using slots.
Using simplegroupby with modules = True
is a good way of
finding out approximately how much memory each module is using.
For example:
>>> import xmlrpclib # Just to make the list of modules a bit more interesting >>> objs = scanner.Objects() >>> mods = annotate.simplegroupby(objs, modules = True) >>> formatting.printsizes(mods) Size Total Object 69184 69184 module sizer.sizes 44315 113499 module linecache 44190 157689 module bisect 43949 201638 module xmlrpclib 38795 240433 module sizer.scanner 25844 266277 module codecs 20927 287204 module copy 20878 308082 module encodings 17474 325556 module base64 12185 337741 module sys
Unfortunately, the global functions of the profiler are counted here. I'll fix this soon. The results of scanning, grouping etc. are not counted, since the scanner ignores these.
If you have pydot installed, you can make graphs of objects (the kind with nodes and edges, not the y-against-x kind). With a large amount of objects, this will take a long time and just produce a mess, so it's sensible to group the objects first.
For example, on the code above, run:
from sizer import graph graph.makegraph(mods, count = 15, proportional = True)
This will return the name of a PostScript file containing a graph
of the biggest 15 modules (count = 15
), with a node
to represent each module. References between modules will be given
as an edge, with the area of each node proportional to the size of
the module (proportional = True
).