视频1 视频21 视频41 视频61 视频文章1 视频文章21 视频文章41 视频文章61 推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37 推荐39 推荐41 推荐43 推荐45 推荐47 推荐49 关键词1 关键词101 关键词201 关键词301 关键词401 关键词501 关键词601 关键词701 关键词801 关键词901 关键词1001 关键词1101 关键词1201 关键词1301 关键词1401 关键词1501 关键词1601 关键词1701 关键词1801 关键词1901 视频扩展1 视频扩展6 视频扩展11 视频扩展16 文章1 文章201 文章401 文章601 文章801 文章1001 资讯1 资讯501 资讯1001 资讯1501 标签1 标签501 标签1001 关键词1 关键词501 关键词1001 关键词1501 专题2001
探究Python多进程编程下线程之间变量的共享问题
2020-11-27 14:41:46 责编:小采
文档


1、问题:

群中有同学贴了如下一段代码,问为何 list 最后打印的是空值?

from multiprocessing import Process, Manager
import os
 
manager = Manager()
vip_list = []
#vip_list = manager.list()
 
def testFunc(cc):
 vip_list.append(cc)
 print 'process id:', os.getpid()
 
if __name__ == '__main__':
 threads = []
 
 for ll in range(10):
 t = Process(target=testFunc, args=(ll,))
 t.daemon = True
 threads.append(t)
 
 for i in range(len(threads)):
 threads[i].start()
 
 for j in range(len(threads)):
 threads[j].join()
 
 print "------------------------"
 print 'process id:', os.getpid()
 print vip_list

其实如果你了解 python 的多线程模型,GIL 问题,然后了解多线程、多进程原理,上述问题不难回答,不过如果你不知道也没关系,跑一下上面的代码你就知道是什么问题了。

python aa.py
process id: 632
process id: 635
process id: 637
process id: 633
process id: 636
process id: 634
process id: 639
process id: 638
process id: 1
process id: 0
------------------------
process id: 619
[]

将第 6 行注释开启,你会看到如下结果:

process id: 32074
process id: 32073
process id: 32072
process id: 32078
process id: 32076
process id: 32071
process id: 32077
process id: 32079
process id: 32075
process id: 32080
------------------------
process id: 32066
[3, 2, 1, 7, 5, 0, 6, 8, 4, 9]

2、python 多进程共享变量的几种方式:
(1)Shared memory:
Data can be stored in a shared memory map using Value or Array. For example, the following code

http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes

from multiprocessing import Process, Value, Array
 
def f(n, a):
 n.value = 3.1415927
 for i in range(len(a)):
 a[i] = -a[i]
 
if __name__ == '__main__':
 num = Value('d', 0.0)
 arr = Array('i', range(10))
 
 p = Process(target=f, args=(num, arr))
 p.start()
 p.join()
 
 print num.value
 print arr[:]

结果:

3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

(2)Server process:

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.
A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array.
代码见开头的例子。

http://docs.python.org/2/library/multiprocessing.html#managers
3、多进程的问题远不止这么多:数据的同步

看段简单的代码:一个简单的计数器:

from multiprocessing import Process, Manager
import os
 
manager = Manager()
sum = manager.Value('tmp', 0)
 
def testFunc(cc):
 sum.value += cc
 
if __name__ == '__main__':
 threads = []
 
 for ll in range(100):
 t = Process(target=testFunc, args=(1,))
 t.daemon = True
 threads.append(t)
 
 for i in range(len(threads)):
 threads[i].start()
 
 for j in range(len(threads)):
 threads[j].join()
 
 print "------------------------"
 print 'process id:', os.getpid()
 print sum.value

结果:

------------------------
process id: 17378
97

也许你会问:WTF?其实这个问题在多线程时代就存在了,只是在多进程时代又杯具重演了而已:Lock!

from multiprocessing import Process, Manager, Lock
import os
 
lock = Lock()
manager = Manager()
sum = manager.Value('tmp', 0)
 
 
def testFunc(cc, lock):
 with lock:
 sum.value += cc
 
 
if __name__ == '__main__':
 threads = []
 
 for ll in range(100):
 t = Process(target=testFunc, args=(1, lock))
 t.daemon = True
 threads.append(t)
 
 for i in range(len(threads)):
 threads[i].start()
 
 for j in range(len(threads)):
 threads[j].join()
 
 print "------------------------"
 print 'process id:', os.getpid()
 print sum.value

这段代码性能如何呢?跑跑看,或者加大循环次数试一下。。
4、最后的建议:

Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues; an approach involving actors exchanging messages is usually seen as a better choice. See also Python documentation: As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes. However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.

5、Refer:

http://stackoverflow.com/questions/14124588/python-multiprocessing-shared-memory

http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing/

http://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.synchronized

下载本文
显示全文
专题