The idea of the article arose after watching one video, where the author analyzes various ways to create a list of identical elements. I became interested in this topic and began to delve into it. In particular, why in one case or another amount of memory occupied is different.
In this article, we will analyze how memory allocation for objects in Python is arranged. Then, briefly about how clearing memory of unused objects works. And finally, about the difference in memory usage using the example of the list, dict and tuple types.
Memory allocation
No memory is allocated directly from the code. All the work of allocating memory is shifted to the memory managers. There is one general «top» level manager that is responsible for allocating a large block of memory allocated to the program – the «arena». Occupies 256KB.
Further, the arena is divided into «pools» of 4KB. Each pool can contain only blocks of a predetermined size for that pool, ranging from 8 bytes to 512 bytes. An arena can contain pools of different sizes, but blocks in one pool are always the same size. And at the block level, the memory managers of each specific type of data work.
When a manager of a certain type requests memory for an object, it knows the size in advance and can immediately access the pool with the desired block size by placing the object in the first free block. If there are no free blocks or there are no pools of the required size, the top manager issues a new pool from the most crowded arena. If all arenas are occupied, a new arena is requested.
Interestingly, it is impossible to partially return the memory allocated for the arena as long as it has at least one non-empty pool in which there is at least one non-empty block. How blocks become empty, we’ll discuss in the next section.
Freeing up memory
In Python, there is no need to manually clear memory. If the object is no longer in use, it is all passed on to the interpreter itself and the two mechanisms: the reference counterandthe garbage collector.
When a variable is assigned a value, in fact, an object is first created in a suitable free block (how memory allocation works was understood in the previous section) and then a reference to this object is placed in the variable.
Reference count
Each created object has a special field – the reference count. It stores the number of objects referencing it. Increases its value, for example, when an assignment operation is used, or when an object becomes part of a list. When you delete a variable or when you use del, the reference count is reduced by 1. For example, when you shut down a function where this variable was declared.
Let’s look at a small piece of code using an example:
In this case, 1000 is an immutable object, one for the entire program. After the two variables are initialized, the reference count is 5.
Why 5? Because object 1000 is referenced not only by these two variables, but by all variables with a value of 1000 in all modules used. Next, remove the variable b and the reference count changes its value to 4. Once the counter reaches 0, the object releases the block.
Garbage collector
The link counter has a big drawback — the inability to catch circular references. If an object or more objects reference each other, the counter will never fall below 1.
Thegc module was created specifically to handle such cases. His job is to periodically scan objects of container type and determine the presence of circular references.
Speaking of a garbage collector, an important concept is thegenerationof objects – this is a set of objects that the collector monitors, and subsequently scans. There are three generations, each of which is scanned at a different frequency. More often, first-generation objects are scanned, because new objects get there. As a rule, such objects have a short lifespan and are temporary. Therefore, it is advisable to check them more. If objects have passed the garbage collector scan and have not been deleted, they are transferred to the next generation.
First-generation scanning begins when the number of container-type objects created exceeds the number of deleted objects by a specifiedthreshold. For example, a second-generation scan will begin when the number of first-generation scans exceeds a predetermined threshold. By default, the trigger thresholds are 700, 10, and 10, respectively.
You can watch them viagc.get_threshold(). Edit viagc.set_threshold().
All this is relevant for list, dict, tuple and more for classes. Not for simple types.
What’s there by data type
List
Let’s start comparisons with list. Since it is changeable and it will be interesting to see how it behaves when the number of elements changes. The list is an array of not a fixed size, with the ability to randomly insert or delete any element. A list item can be any data type. In terms of storing a list in memory, it consists of two blocks. One block is a fixed size and stores information about the list. The other stores references to elements and can move from block to block if the number of elements changes.
The size of the memory block occupied by the object can be viewed throughsys.getsizeof().
As we can see, an empty list object already occupies 64 bytes.
With the usual creation of a list through the enumeration of items, it will always occupy: 64 bytes empty list + 8 bytes for each item, because the list is an array of references to objects.
When created through list comprehension, the size will already be 96. That is larger than the size of an empty list + 8 bytes per item. This mechanism works by calling the append method on the list object that you are creating. append works as follows: depending on the number of items already present in the list, it reserves more memory in advance when an item is added. The additional amount of memory allocated does not always double the list. Space can be allocated for only a few items. For example, if another one is added to a list of 8 items, space will be reserved for eight more items. And when you add 16 items to a list of 16 items, space will be reserved for only 9 items. This avoids the cost of resizing the list with frequent append calls. At the same time, unused, but already allocated, memory is not available for reading.
When you create a list based on a tuple, the resulting list will take up 112 bytes. In this case, a place is reserved in advance for 3 more list items, in addition to those already present in the tuple.
In total, we get 64 bytes, + 8 * 3 are elements from the tuple, + 8 * 3 reserved space for new elements.
Tuple
A tuple is a fixed-length array specified when the object was created. Tuple elements can also be objects of any type. Unlike a list, a tuple in memory is represented by a single object. Because there is no variable part that needs to be moved between the blocks. Oh, and the tuple also has no methods for changing the elements. However, if the element itself belongs to a modifiable type, it can still be changed.
An empty tuple occupies a block of 48 bytes. Remember that motorcades are immutable. If you already have the same tuple in your memory, no new object will be created. Instead, the variable is assigned a reference to an existing object.
An example shows that the memory addresses of two tuples are the same. A similar comparison for lists would return False.
In the case of a non-empty motorcade with memory, everything is also simple. The block size will be 48 bytes + 8 bytes per element.
With the creation of a tuple from a list or other collection, there are also no such surprises as with a list. Because there is no need to lay a place for changing the number of elements.
Dict
A dictionary is an array of keys and an array of values, where each key is associated with a single value. A uniqueness restriction is imposed on the key within the dictionary. Therefore, keys can only be objects of immutable types. The value can be an object of any type.
Like lists, dictionaries are stored as two objects. The first contains information about the dictionary itself and always remains in the same block. The second stores key-value pairs and can move between blocks when resized. But at the same time, an empty dictionary takes up much more space.
As you can see in the example, the dictionary takes up 240 bytes. When you create a dictionary, space is allocated for multiple items, not just after you add an item, as was the case with the list.
If you call the clear method, not only all items are cleared. The original reserved memory will also be released.
As a result, the dictionary began to occupy only 72 bytes. Much less than when created.
As with a list, for each new key-value pair, the entire object is not moved to the new block. To avoid frequent travel costs, a new block is taken with a margin for several elements.
Conclusion
Knowing how much memory an object will occupy is useful. For example, a project or task condition specifies strict memory limits. Or a piece of code is called very often, you need to understand whether the allocation/release of memory will not be a bottleneck of your system.
Therefore, I recommend that you estimate the approximate memory costs at the time of writing the code. And no one cancels the reuse of existing facilities.
If you are interested in this topic and decide to delve into it, then pay attention to the following materials: