Wednesday, May 23, 2012

Async: To Be or Not To Be

Just because I have to use a callback-oriented style on the client doesn't mean I want to use a callback-oriented style on the server. Now, before anyone gets all upset and tells me that I don't know the difference between async and a kitchen sink, let me explain :)

The client is necessarily an event-oriented place. If I don't know which button the user is going to press, it makes a lot of sense to use a different callback for each button. The server is different. If I'm waiting for the result of a database query before I can continue processing a request, it sure is convenient to just block and wait.

My key point is that it's important to separate what style you want to code with and what performance and scalability characteristics you want. You shouldn't necessarily pick a callback-oriented style just because you want the performance and scalability characteristics of asynchronous networking APIs.

My favorite two examples are gevent and Erlang, but Go is similar. When you code using gevent or Erlang, your code looks like synchronous, blocking code. However, below the covers, they use asynchronous networking APIs. Now, before anyone tells me that it's impossible, buggy, or that it'll never work, let me point out that these tricks have been in production for decades at Ericsson, Yahoo Groups, and IronPort Cisco.

Furthermore, I should point out that asynchronous networking APIs aren't a perfect fit for every problem. For instance, if your goal is to send 10 gigabytes of information to another server, it turns out that synchronous networking APIs will actually outperform asynchronous networking APIs. The reason asynchronous networking APIs are so popular is because they can handle a larger number of clients than synchronous networking APIs can and because they use less memory than a large number of threads, which each have to have their own stack. gevent and Erlang can handle a large number of clients, don't use up much memory, and don't require a real OS-level stack per client.

So what's my problem with the callback-oriented style? I find it a lot harder to read. I've coded projects in Twisted, Node.js, etc., and I prefer the gevent approach. You get roughly the same performance and scalability characteristics, but with much easier to read code. Of course, what's readable to me may not be readable to other people. I've met people who are perfectly happy using Twisted Web 1 and don't think that callback-oriented code poses any real challenge.

If you're interested in hearing more about my thoughts on async and concurrency, check out my other blog posts, which include a link to my Dr. Dobb's Journal article on Python concurrency.

9 comments:

verte said...

I know that a lot of arguments about async these days are around performance, but the primary motivation for it actually is conceptual. I'm sure you've read the problem with threads, since it gets thrown around on #python quite a bit, but a better one-page explanation of the conceptual simplicity of async is the distributed computing example in E.

I imagine the sync/async divide is similar to the functional/stateful divide, where there are implementation details that drive performance issues, but the more interesting aspect is a matter of how we think about problems, and what problems become significantly simpler to understand when posed asynchronously. Something you may have considered is what it would take to design a sensible memory model for python or javascript in a threaded vs an async world (if you imagine they are mutually exclusive).

Sam Rushing said...

Event-driven stuff doesn't scale. 8^)

You might be able to write an event-driven HTTP server. But once you try to combine it with an event-driven DNS resolver and an event-driven database client, the state space has exploded.

You might then write a set of tools - help from the compiler or runtime - to essentially reinvent a cooperative threading package. And that works great. But at some point the difference between your threading package and the next one comes down to semantics.

The true Holy Grail of [server] scalability would be to route around the barrier presented by the operating system [which is itself a coroutine system] and have an OS with no kernel/user wall and without artificial limits on scalability. Depending on your politics, you could call it an "in-kernel server" or a "user-space tcp stack".

Peter Zsoldos said...

I don't have enough experience with C# 5's await keyword to judege it properly, but it sure makes for an easier read. I wonder if something like that would be possible with some helper function in python... E.g.: with await(asyncMethodInvoication): process result

And the point of async not always being the right solution is certainly a valid one!

Shannon Behrens said...

verte, sorry I haven't read anything more than the abstract for "the problem with threads", and I haven't read "the distributed computing example in E." It's going to take me a while to get to those. In the meantime, do you care to summarize?

I actually wasn't arguing for threads. I don't actually like threads. Erlang has something it calls processes, which I think is ideal. gevent has greenlets which are actually a lot more deterministic than threads.

Shannon Behrens said...

Peter, C#'s await keyword reminds me of EventMachine in Ruby. It looks like it's a way to use blocks as callbacks. Is that right?

I do think having blocks makes callback-oriented programming easier to read, but it's still not as easy to read as, say, gevent's approach.

verte said...

The distributed computing example is a summary, so I won't try to summarise it here. The introduction is about a page.

"The problem with threads" deals with the explosive nondeterminism resulting from shared-state concurrency. To be clear, I would also consider gevent to be shared state concurrency*, in that you can't look at a function and be able to tell from its body if it contains a context switch or not - that could be hidden away inside some function that we call.

This is a significant burden on the programmer. As someone who maintains a moderately-sized swing application rife with concurrency bugs, I think I can say, until programmers are forced to think about concurrency from the outset, maintenance is a battle between introducing new code and trying to figure out the way it interacts with existing locks and tasks.

But don't take my word for it: the article mentions a concrete example of an application written by concurrency experts that mysteriously deadlocked once they bought a machine with more cores.

In general, I'm all for runtimes and compilers that figure out details so you don't have to. I like dynamic types, I like garbage collection. But concurrency is a more complicated subject, I think, and it deserves very explicit language from the programmer. (Concurrent Haskell is an interesting example - the language is functional, the concurrency features serve only to give greater control over communication to the programmer).

* important side note: finalisers actually introduce shared state concurrency in many languages, python included. See eg. unexpected concurrency

Shannon Behrens said...

verte, thank you for the excellent comment! In general, I agree with you.

gevent is non-deterministic, but it's not as bad as threads which can context switch at any time. Since it can only context switch when doing IO, the problem isn't nearly as heinous. Sure, that's not perfect, but it's a lot easier for me to wrap my brain around than threads.

As for multi-threading in Swing, I wrote some quick tricks here (http://jjinux.blogspot.com/2007/12/python-some-concurrency-tricks.html). Basically, I avoid mutable, shared state like the plague.

gus said...

Since you mentioned it, I'm curious why didn't Ironport use Erlang instead of developing a new concurrency framework for a slow interpreted language like Python?

If you were starting today (2013), do you think Erlang is the right tool for network appliances like Iron port's?

BTW, great blog!

Cheers,

Gus

Shannon Behrens said...

Thanks, Gus. Erlang wasn't as popular back then as it is now. My guess is that none of the early IronPort people even knew about it. In contrast, Sam Rushing already knew how to solve the async problem in Python.

Python will never be as good as Erlang at what Erlang does. Hence, for certain network servers, it makes a lot of sense to use Erlang. However, Python has so many other advantages that it probably makes sense to use Python (and gevent) as the "main" language for a company.