Garbage Collector Python

GC (generational garbage collector) is a garbage collector, it was created primarily to detect and delete cyclic references. gc is a built-in python module and, if necessary, it can be turned off and run manually (or not run). To understand why GC was created, you need to understand how the memory manager works in Python and how this memory is released.

Unlike other popular languages, Python does not release all memory back to the operating system as soon as it deletes an object. Instead, it uses an additional memory manager designed for small objects (whose size is less than 512 bytes). To work with such objects, it allocates large blocks of memory, in which many small objects will be stored in the future.

As soon as one of the small objects is deleted - the memory from under it does not pass to the operating system, Python leaves it for new objects with the same size. If there are no objects left in one of the allocated memory blocks, Python can release it to the operating system. As a rule, the release of blocks happens when the script creates a lot of temporary objects.

Thus, if a long-lived Python process starts consuming more memory over time, it does not mean at all that there is a memory leak problem in your code.

The standard Python interpreter (CPython) uses two garbage collection algorithms at once, reference counting and generational garbage collector (hereinafter GC), better known as the standard gc module from Python.

The link counting algorithm is very simple and efficient, but it has one big drawback. It does not know how to define cyclic references. It is because of this that there is an additional collector in python, called generational GC, which monitors objects with potential cyclic references.

In Python, the reference counting algorithm is fundamental and cannot be disabled, whereas GC is optional and can be disabled.

Unlike the link counting algorithm, cyclic GC does not work in real time and runs periodically. Each run of the garbage collector creates micropauses in the code, so CPython (the standard interpreter) uses various heuristics to determine the frequency of the garbage collector run.

The cyclic garbage collector divides all objects into 3 generations (generations). New objects are included in the first generation. If the new object survives the garbage collection process, then it is moved to the next generation. The higher the generation, the less often it is scanned for garbage. Since new objects often have a very short life span (they are temporary), it makes sense to poll them more often than those that have already gone through several stages of garbage collection.

Each generation has a special counter and a trigger threshold, upon reaching which the garbage collection process is triggered. Each counter stores the number of allocations minus the number of deallocations in this generation. As soon as a container object is created in Python, it checks these counters. If the conditions are triggered, the garbage collection process begins.

If several or more generations have crossed the threshold at once, then the oldest generation is selected. This is done due to the fact that the old generations also scan all the previous ones. To reduce the number of garbage collection pauses for long-lived objects, the oldest generation has an additional set of conditions.

The standard trigger thresholds for generations are set to 700, 10 and 10 respectively, but you can always change them using the

gc.get_threshold

and

gc.set_threshold functions

Write in private messages, in comments, on social networks. I will always try to help everyone

20