Python multiprocessing HOWTO

Using python multiprocessing module is quite simple, even though it has some pitfalls. But remembering how to avoid them it can help to really speed up some tasks.

In the examples I will use a simple program calculating Fibonacci numbers

main is the main function controlling the program. fib is doing the actual work. fibonacci is a wrapper we will change later to use multiprocessing withouth changing fib. Adjust the numbers in range(30, 37) to match speed of your computer, so the program takes about 10-20 seconds to run. Output should be like this:

We need a way to make it possible to run the program starting fibonacci only once. I will use Queue to prepare the data for it. Add:

and change the main and fibonacci functions to look like this:

What changed? The fibonacci is now a loop, taking values from queue and processing them one after another. This ends when it receives Null as value (“poison pill” – line 4-5). The data is inserted into the queue in the main function together with the poison pill (lines 16-18).

But wait! Now there is no way to send a reply from fibonacci! So let’s create another queue to receive the results. main and fibonacci change again:

What’s going on: main puts requests into the first queue (17-19), calls fibonacci (21), and prints results taken from second queue (23-25). fibonacci gets data from first queue (3) until it encounters Null (4-5), and puts results into second queue (10). Now, as the fibonacci is separated from main using the queues we can put it into separate process.
Change the line that calls fibonacci to:

Now the fibonacci starts after the first queue is full, but the program won’t wait for it to finish, it immediately goes to displaying the results. But it doesn’t work. Why?

Well, fibonacci wasn’t fast enough to compute any result, so the while loop (in line 23) started when results queue was still empty and quit too early. We can use second poison pill sent from fibonacci to solve this.

Another thing to keep in mind is that the main program waits for the worker to end. Luckily fibonacci ends (thank to the poison pill), but nobody is waiting any more for the results, so they don’t show up. This will be also solved with the poison pill. At the end of fibonacci add:

and change the loop in main to:

It works again! Now that is needed to have several processes work in parallel is to start them. Remember to add the necessary poison pills! Complete final version looks like this:

Line 22 adds the poison pill to notify main that a fibonacci process has ended. Lines 30-32 start as many processes as we have cpus (determined in line 29). Lines 36-37 add poison pills for all fibonacci processes. Lines 41-43 keep track of how many processes are still running.

Remember, this is not the only solution. For example the results loop could simply count how many results it received. The optimal processes number depends on problem. If it’s just calculating data (like here) then CPU count is good bet. If there is substantial time spend on waiting for network operations, then more may be better. But if the slowdown is caused by some slow resource (like slow disk), then more processes can actually slow down performance. Usually it’s easiest to just try it out.

One more tip: when using multiple processes it is important to remember that one of them could raise exception when it encounters a problem. Always wrap all risky parts in except, and handle the exceptions. Also think of ways of terminating the program early when necessary. Remember: the program ends only, when all processes end.

  • http://pavelkarateev.com/ Pavel Karateev

    At least first code snippet has errors: end and n in the main functions are undefined, start and end in the fibonacci function are useless.