Thursday, February 27, 2014

Python: A Response to Glyph's Blog Post on Concurrency

If you haven't seen it yet, Glyph wrote a great blog post on concurrency called Unyielding. This blog post is a response to that blog post.

Over the years, I've tried each of the approaches he talks about while working at various companies. I've known all of the arguments for years. However, I think his blog post is the best blog post I've read for the arguments he is trying to make. Nice job, Glyph!

In particular, I agree with his statements:

What I hope I’ve demonstrated is that if you agree with me that threading has problematic semantics, and is difficult to reason about, then there’s no particular advantage to using microthreads, beyond potentially optimizing your multithreaded code for a very specific I/O bound workload.
There are no shortcuts to making single-tasking code concurrent. It's just a hard problem, and some of that hard problem is reflected in the difficulty of typing a bunch of new concurrency-specific code.

In this blog post, I'm not really disputing his core message. Rather, I'm just pointing out some details and distinctions.

First of all, it threw me off when he mentioned JavaScript since JavaScript doesn't have threads. In the browser, it has web workers which are like processes, and in Node, it has a mix of callbacks, deferreds, and yield. However, reading his post a second time, all he said was that JavaScript had "global shared mutable state". He never said that it had threads.

The next thing I'd like to point out is that there are some real readability differences between the different approaches. Glyph did a good job of arguing that it's difficult to reason about concurrency when you use threads. However, if you ignore race conditions for a moment: I think it's certainly true that threads, explicit coroutines, and green threads are easier to read than callbacks and deferreds. That's because they let you write code in a more traditional, linear fashion. Even though I can do it, using callbacks and deferreds always cause my brain to hurt ;) Perhaps I just need more practice.

Another thing to note is that the type of application matters a lot when you need to address concurrency concerns. For instance, if you're building a UI, you don't want any computationally heavy work to be done on the UI thread. For instance, in Android, you do as little CPU heavy and IO heavy work as possible on the UI thread, and instead push that work off into other threads.

Other things to consider are IO bound vs. CPU bound, stateful vs. stateless.

Threads are fine, if all of the following are true:

  • You're building a stateless web app.
  • You're IO bound.
  • All mutable data is stored in a per-request context object, in per-request instances, or in thread-local storage.
  • You have no module-level or class-level mutable data.
  • You're not doing things like creating new classes or modules on the fly.
  • In general, threads don't interact with each other.
  • You keep your application state in a database.

Sure there's always going to be some global, shared, mutable data such as sys.modules, but in practice Python itself protects that using the GIL.

I've built apps such as the above in a multithreaded way for years, and I've never run into any race conditions. The difference between this sort of app and the app that lead to Glyph's "buggiest bug" is that he was writing a very stateful application server.

I'd also like to point out that it's important to not overlook the utility of UNIX processes. Everyone knows how useful the multiprocessing module is and that processes are the best approach in Python for dealing with CPU bound workloads (because you don't have to worry about the GIL).

However, using a pre-fork model is also a great way of building stateless web applications. If you have to handle a lot of requests, but you don't have to handle a very large number simultaneously, pre-forked processes are fine. The upside is that the code is both easy to read (because it doesn't use callbacks or deferreds), and it's easy to reason about (because you don't have the race conditions that threads have). Hence, a pre-fork model is great for programmer productivity. The downside is that each process can eat up a lot of memory. Of course, if your company makes it to the point where hardware efficiency costs start outweighing programmer efficiency costs, you have what I like to call a "nice to have problem". PHP and Ruby on Rails have both traditionally used a pre-fork approach.

I'm also a huge fan of approaches such as Erlang that give you what is conceptually a process, without the overhead of real UNIX processes.

As Glyph hinted at, this is a really polarizing issue, and there really are no easy, perfect-in-every-way solutions. Any time concurrency is involved, there are always going to be some things you need to worry about regardless of which approach to concurrency you take. That's why we have things like databases, transactions, etc. It's really easy to fall into religious arguments about the best approach to concurrency at an application level. I think it's really helpful to be frank and honest about the pros and cons of each approach.

That being said, I do look forward to one day trying out Guido's asyncio module.

See also:

7 comments:

James said...

This is a really interesting read and I can't recall whether or not I read Glyph's blog post on the matter! I will read it today sometime... I have a great level of respect for Glyph and the work he's done.

It would be interesting for me I think to write about this as wlel since this is now the 10th year of development for a kind of concurrency (application) framework (circuits). I and many developers over the years have tried very hard to make the so-called "brain hurting" of callbacks and deferreds as minimal as possible. In short circuits does not employ the use of callbacks and defereds per say but rather events and promises which I've found much easier to both reason about and follow.

I guess I'll re-read both blog posts (yours and Glyph's) and formulate my own response as well :) Another 2c worth can't hurt? :)

cheers
James

Shannon Behrens said...

Sounds good. Nice to meet you, by the way :)

I'd be interested in hearing how deferreds in Twisted differ from promises in Circuits. Usually those terms are treated as synonymous.

Also, registering to listen for an event is a lot like registering yourself with a deferred. At some level, you're still registering a callback, right?

Your description of Circuits reminds me of Flight.js which a buddy of mine wrote.

James said...

For a minute there I thought you both knew me and circuits :) But I mistook your blog for this blog post: http://mindref.blogspot.com.au/2012/09/python-fastest-web-framework.html -- which had a very similar theme/style to your own blog :)

Re deferreds vs. promises -- I always took deferreeds in Twisted (at least) to be more tightly bound to callbacks (callbacks and errbacks) where you "register" a function that gets called upon succession of failure of some part of a chain.

In contrast circuits (whilst has the notation of callbacks if you will in it's core -- we call them event handlers) has promises which behave more like proxied values which become the value when ready. Since circuits started it's development in ~2002 and became "circuits" the name and branded project in 2004 it shared nothing in common with Twisted in terms of design or behavior.

cheers
James

James said...

Oh yes and nice to meet you too! :)

cheers
James

Shannon Behrens said...

I see, so the way your code is called is different. However, you still can't write linear code such as:

value = query_value_from_database()

print value

right?

James said...

Actually in circuits you can :)

cheers
James

James said...

My aploogies for the short reply, but you can do something like this in circuits-3.0.0.dev (3.0 being relased soon):

class App(Component):

def bar(self):
return "Hello World!"

def foo(self):
x = yield self.call(Bar())
print(x)

A contributor and trivial example I know; but eh core concepts here show how you could scale this to something far more complex. e.g: firing an event to a database engine that performs several I/O operations and does some CPU bound work before returning a result, waiting for that completion of that event and returning a result.

circuits 3.x utilizes Python generators to create a sort of co-routine based control structure on top of it's event-driven architecture. So you can do things like: wait for an event to occur, call an event synchronously (asynchronously underneath), etc.

This allows us to reason about "event handlers" and what they do with the "event" (data) as they participate in a large system. Scale in terms of complexity is derived from building larger more complex components from simpler ones as demonstrated by circuits.web and many applications written atop this.

cheers
James