It takes about 5 minutes to read this article.
If your program quits after running it once, you may not realize the importance of memory management. If the program you write needs to run continuously 7x24 hours, then memory management is very important, especially for important services, there should be no memory leaks.
The memory leak here is not due to the loss of data in the memory, or the physical disappearance of the memory space, but refers to the program itself is not well designed, causing the occupied memory to be released but not actually released, resulting in serious system available memory Insufficient, the system or service crashes due to this.
How does Python perform garbage collection? In other words, how does Python reclaim memory space that is no longer used?
As we all know, everything in Python is an object, and an object occupies a certain amount of memory. We access an object through a variable. The essence of a variable is a pointer (address) of the object.
How to let us decide which object's space to reclaim, it is easy to think of such a method: when there is no variable pointing to the object, it means that it is useless, and the space it occupies can be reclaimed. In fact, Python does this, let's look at a piece of code and the result:
import os
import psutil
# Display the memory size occupied by the current python program
def show_memory_info(hint):
pid = os.getpid()
p = psutil.Process(pid)
info = p.memory_info()
memory = info.rss /1024.0/1024print("{}Memory footprint: {} MB".format(hint, memory))
def func():show_memory_info("Before func is called")
a =[i for i inrange(10000000)]show_memory_info("Before func is called")if __name__ =="__main__":func()show_memory_info("After func is called")
The results are as follows:
Memory usage before func is called:29.63671875 MB
Memory usage before func call ends:414.1640625 MB
Memory usage after func call ends:27.2265625 MB
Through this example, it can be seen that after a large list a is created in the function func, the memory usage quickly increases to 400 MB, and the memory is restored to 27 MB after the func call ends, indicating that after the func call ends, Python knows that the variable a is no longer Is used, so garbage collection is performed.
From another perspective: the list a declared in the function is a local variable. After the function returns, the reference of the local variable will be cancelled; at this time, the number of references to the object pointed to by the list a is 0, and Python will perform garbage collection , So the large amount of memory previously occupied is back.
If we modify the variable a in the func function to be a global variable, then a will still be used after the function call ends, and the memory will not be reclaimed at this time:
def func():show_memory_info("Before func is called")
global a
a =[i for i inrange(10000000)]show_memory_info("Before func is called")
The execution result is:
Memory usage before func is called:29.625 MB
Memory usage before func call ends:416.796875 MB
Memory usage after func call ends:416.80078125 MB
That is to say, when the reference count of a variable is 0, the Python interpreter can increase and reclaim it. So the question is: how to judge the number of references to an object?
Fortunately, the Python standard library provides a function sys.getrefcount(var)
that can directly view the reference count of a variable. The method of use is as follows:
import sys
a =[]
# Two references, one from a and one from getrefcount
print(sys.getrefcount(a))
def func(a):
# Four references, a, python function call stack, function parameters, and getrefcount
print(sys.getrefcount(a))func(a)
# Two references, one from a and one from getrefcount, the function func call no longer exists
print(sys.getrefcount(a))
Output
242
A brief introduction, the sys.getrefcount() function can view the number of references to a variable. This code itself should be easy to understand, but don't forget that getrefcount itself will also introduce a count.
Another thing to note is that when a function call occurs, there will be two additional references, one from the function stack and the other from the function parameter.
import sys
a =[]print(sys.getrefcount(a)) #twice
b = a
print(sys.getrefcount(a)) #three times
c = b
d = b
e = c
f = e
g = d
print(sys.getrefcount(a)) #Eight times
Seeing this code, you need to pay a little attention. The variables a, b, c, d, e, f, g all refer to the same object, and the sys.getrefcount() function does not count a pointer, but It is to count the number of references to an object, so there will be a total of eight references in the end.
Now you understand that Python will automatically collect garbage.
Although Python can automatically reclaim memory, I prefer to reclaim memory manually, can I? It's very simple. If there is a variable a and you don't want to use it anymore, then execute two codes to get it:
del a
gc.collect()
I think someone must think that they understand it. Then, if an interviewer asks at this time: Is 0 citations a necessary and sufficient condition for garbage collection to start? Are there other possibilities?
If you are trapped too, don't worry. We might as well ask small steps and think about this question first: if there are two objects that reference each other and are no longer referenced by other objects, should they be garbage collected?
import os,sys
import psutil
# Display the memory size occupied by the current python program
def show_memory_info(hint):
pid = os.getpid()
p = psutil.Process(pid)
info = p.memory_info()
memory = info.rss /1024.0/1024print("{}Memory footprint: {} MB".format(hint, memory))
def func2():show_memory_info("Before func2 is called")
a =[i for i inrange(10000000)]
b =[i for i inrange(10000000)]
a.append(b)
b.append(a)show_memory_info("func2 before the end of the call")if __name__ =="__main__":func2()show_memory_info("After func2 is called")
The output is as follows:
Memory usage before func2 is called:29.65625 MB
Memory usage before func2 call ends:795.01953125 MB
Memory usage after func2 is called:795.09375 MB
Here, a and b refer to each other, and, as local variables, after the function func is called, the two pointers a and b no longer exist in the program sense. However, it is obvious that there is still memory usage! why? Because of mutual references, their number of references is not 0.
Imagine that if this code appears in a production environment, even if the space occupied by a and b is not very large at the beginning, after a long time of running, the memory occupied by Python will definitely become larger and larger, and eventually burst Server, the consequences are disastrous.
Of course, someone might say that quoting each other is easy to be found, and it is not a big problem. However, a more insidious situation is the appearance of a reference loop. When the engineering code is more complicated, the reference loop may not be easily discovered.
If you are really afraid of the occurrence of the reference ring and fail to check it out, you can call gc.collect()
to collect garbage, and call gc.collect()
at the end of the above code func2 call.
if __name__ =="__main__":func2()
gc.collect()show_memory_info("After func2 is called")
Results of the
Memory usage before func2 is called:29.625 MB
Memory usage before func2 call ends:804.62109375 MB
Memory usage after func2 is called:30.54296875 MB
The above is a demonstration of our manual collection. In fact, Python can handle it automatically. Python uses the mark-sweep algorithm and generational collection to enable automatic garbage collection for circular references.
Let's look at the mark-sweeping algorithm first: Let's first use graph theory to understand the concept of unreachable. For a directed graph, if the traversal starts from a node, and all the nodes it passes through are marked; then, after the traversal ends, all the nodes that are not marked are called unreachable nodes. Obviously, the existence of these nodes is meaningless. Naturally, we need to garbage collect them. Of course, traversing the whole graph every time is a huge performance waste for Python. Therefore, in Python's garbage collection implementation, mark-sweep uses a doubly linked list to maintain a data structure, and only considers objects of the container class (only container class objects may generate circular references). I won’t talk more about the specific algorithm here, after all, our focus is on the application.
Let's look at generational collection again: Python divides all objects into three generations. The newly created object is the 0th generation; after a garbage collection, the objects that still exist will be moved from the previous generation to the next generation in turn. The threshold for starting automatic garbage collection for each generation can be specified separately. When the new object minus the deleted object in the garbage collector reaches the corresponding threshold, the object of this generation will be garbage collected. In fact, generational collection is based on the idea that new objects are more likely to be garbage collected, and objects that survive longer have a higher probability of continuing to survive. Therefore, through this approach, a lot of calculation can be saved, thereby improving the performance of Python.
You should be able to answer the question you just faced! Yes, reference counting is one of the simplest implementations, but remember that reference counting is not a necessary and sufficient condition, it can only be counted as a sufficient non-essential condition; as for other possibilities, the circular reference we are talking about is one of them.
Like the bracelet reference mentioned above, is there a way to represent the reference relationship of variables in a tree-like diagram? So you can debug the memory leak. In fact, there is, it is called objgraph, a very useful package for visualizing reference relationships. In this package, I mainly recommend two functions, the first is show_refs()
, which can generate a clear reference diagram.
import objgraph
a =[1,2,3]
b =[4,5,6]
a.append(b)
b.append(a)
objgraph.show_refs([a])
objgraph.show_backrefs([a])
They display the reference relationship graphically, which is known after running, which is very convenient for debugging. The official document https://mg.pov.lt/objgraph/.
1、 Python will automatically perform garbage collection. 2. Recycling when the reference count is 0 is the simplest case, and there will be circular references. 3. Python has two automatic recycling algorithms.
4、 For debugging memory leaks, objgraph is a good visual analysis tool.
(Finish)
Recommended Posts