Wednesday, September 27, 2006

Books: Scalable Internet Architectures

Scalable Internet Architectures

I can summarize this book:
  1. High availabililty and load balancing are as completely different as peanut butter and jelly.
  2. Spread is cool.
  3. Look, ma! I can write complex code in multiple languages!
Overall, however, I liked the book. It was all new material to me, and I'm glad I read it.

Friday, September 22, 2006

Databases: More Atomic Cluster Commits

I'm currently reading the book Scalable Internet Architectures, and I'm enjoying it.

On page 141, he describes two-phase commits:
The basic idea is that the node attempting the transaction will notify its peers that it is about to commit, and they will react by preparing the transaction and notifying the originating node that they are ready to commit. The second phase is noticing any aborts in the process and then possibly following through on the commit (hopefully everywhere)...The two-phase commit is not perfect, but it is considered sufficient for applications such as stock trading and banking...Despite its shortcomings, the only common "correct" multimaster replication technique used today is the 2PC, and, due to its overhead costs, it is not widely used for geographic replications.
I didn't catch the phrase "not perfect" the first time I read that section, so I spent the day wondering how the hell they implemented fully atomic transactions across a cluster of machines. What happens if all the machines agree that they're right about to finish the commit and then one of the machines goes down right before writing the final bit to the transaction log and the others don't? I'll readily admit I'm no database expert, so I figured I must be missing something. I fought with it for a while, and I was very pleased to re-read that section and see the phrase "not perfect".

Well, how could you do a perfect cluster-wide commit with absolutely no race conditions? Let's assume the machines are all near each other since 2PC doesn't work across the country anyway without a ton of lag. I have a silly brute force solution that may not be perfect, but is a bit closer to atomic (i.e. it relies on the hardware in the same way mutexes do):

Start with battery-backed RAID cards. However, design the RAID cards so that they have one input wire and one output wire specifically to address this problem. Now, put all the machines in a circuit using this wire:
_______________
| |
|_N1_N2_N3_N4_|
On each of the RAID cards is a switch. If all the machines open their switches, current can flow. If any one of them closes their switch, current doesn't flow. It's the hardware equivalent of an AND operation; did I mention I'm not a hardware guy? Anyway, my commit system proceeds pretty much the same as the 2PC. When the machines have agreed that they're ready to do the final commit, the kernel setups up the sector to be written to disk and lets the RAID card take over. Then, they each open their switches. When the current starts flowing, the RAID card recognizes this signal to write the sector containing the data for the transaction log saying that the commit took place. If the RAID card is, for some reason, unable to perform its duties, it shuts down and declares that it's broken. I.e. the machine will die, but there won't be inconsistency.

Well, there may be holes in this scheme. Perhaps it was written about 30 years ago just like all the other interesting CS problems. It does require specialized hardware. All those reservations aside, I think this leads to a more atomic cluster commit.

Python: I Love Genshi!

I’ve totally fallen in love with Genshi! It's an XML templating engine for Python. I learned it between midnight and 2AM one night. By the next day, I was totally productive and totally loving it! I like the fact that template inheritance works so easily, and I love the XPath stuff. It's nice to be free of XSS vulnerabilities to some extent. I really didn't like Tal, so I was surprised to find that Genshi was so nice. It’s weird--Genshi is like a superset of all the templating engines, but in a way that is conceptually simple and elegant.

More about Genshi

Monday, September 18, 2006

Python: Stacked Thread-local Object

This is a cute trick: Registry for handling request-local module globals sanely.

It has all the ease of use of globals but with the thread-safety of shoving everything in a per-request object called ctx.

Friday, September 15, 2006

A Summary of Talks I've Attended Recently

I've been to a lot of talks and a conference recently. These included:

I thought I would blog about ideas that were either:

  • New and interesting.
  • Not new, but came up a lot.

Attitude

  • Be passionate or go home.
  • Be a pain killer, not a vitamin.
  • Be pragmatic; working code is better than beautiful code.
  • What's in it for the user?

Startups Should Keep in Mind

  • Big, bloated companies are good targets.
  • There is lots of room for passion-centric communities.
  • We consistently see the pattern of boom, correction, lasting change.
  • Amateurs are becoming really powerful.

Getting Started

  • Keep talking to real users, not just your techie friends.
  • Don't go beta with something that sucks.
  • Raise less money, spend less money, hire slowly, fire quickly.
  • Developers should be in the same timezone.
  • Use specialists; for instance, don't waste a good engineer fighting XHTML/CSS browser issues.
  • Don't cut corners; take care of the details.

Attracting and Keeping an Audience

  • Make it dead simple to use.
  • Do your own support.
  • Make contacting you really easy; an email address is not enough.
  • Declaw your customers by being polite and apologetic when they are rude.
  • Groups begin to fail when there are more than 150 people.
  • Don't break APIs.
  • Support CSV.

User Experience and Design

  • Put off forcing people to sign in as long as possible.
  • It's all about discoverability, recoverability, context, and feedback.
  • Make it pretty.

Production Environment

  • Plan for maintenance.
  • Graph and measure like crazy; create a dashboard.
  • Do one touch deploy; automate everything.
  • It's all about process.
  • Don't be special.
  • Design for debugability.
  • Embrace cheap hardware; expect hardware failure.

User Contributed Content

  • Most people won't contribute--that's okay.
  • Spam is a deep problem.
  • A self policing community is necessary, but not sufficient.

What's Cool

  • flickr
  • digg
  • Web APIs
  • Second Life
  • Creative Commons
  • Agile software development
  • Interoperability between Web and desktop
  • Adobe Apollo
  • memcached
  • Newsvine
  • Techcrunch

The most popular and impactful talks did not use traditional bullet point slides. Rather, they were mostly verbal, without reference to notes. Nonetheless, they were extremely well focused and structured. Slides often consisted of images licensed under Creative Commons found on flickr. The images set the mood. Also popular were slides that showed a single word or phrase. These slides were synchronized perfectly to the speech.

Tuesday, September 12, 2006

Free Culture: Lawrence Lessig Talk Now Available

Lawrence Lessig's keynote Free Culture: What we need from you is now available. It was the best keynote I've ever seen at LinuxWorld.

Take the time--it'll blow your mind! ;)

Erlang: Toying with Erlang

I'm really interested in distributed computing and concurrency right now. Given that I've been playing with Haskell so much, I thought I'd give Erlang a try since it's all about distributed computing, concurrency, and scalability.

I must admit that it already seems easier for me to read than Haskell. I'm not sure why. As I mentioned before, I do like Erlang-style concurrency, since strangely enough, that's how I had always thought things should work.

I'm fascinated by Mnesia, Erlang's distributed database. Unfortunately, according to this:
Mnesia is not perfect, of course, and its biggest downside at the moment is that its disc storage engine isn't suited for storing large volumes of data (Mnesia was designed for soft real-time applications where the data is stored mostly in RAM), but I hope this will be resolved in the not-too-distant future.
This is a major downer for me because I'm currently interested in terabyte-sized data sets. Nor is there a suitable MySQL driver.

Yaws is the Web application framework for Erlang. I can't say whether it's good or bad, but as a matter of taste, ever since I used Apache's Element Construction Set back in the day, I've had a particular dislike for generating HTML using programming language syntax as is done in Yaws:
%% the little status field in the upper left corner
head_status(User) ->
T =
{ehtml,
{table, [],
{tr, [],
[{td, [{width, "30%"}],
{table, [ {border, "1"}, {bgcolor, beige},{bordercolor, black}],
[{tr, [], {td, [], pb("User: ~s", [User])}}
]}
},
{td, [{align, right}], {img, [{src, "junk.jpg"}
]}}
]
}
}
}.
Yuck :(

Nonetheless, Erlang has already taught me new ways of thinking about distributed computing. I guess I'm wondering if there's anything I can't do the Erlang way in Python. Afterall, consider Candigram, which is an implementation of Erlang concurrency primitives in Python; although Candigram itself doesn't provide microthreads like Erlang. That reminds me, I'm still looking for an answer to my post Limitations of Coroutines via Enhanced Generators.

And if you're wondering--no, I don't know what the hell I'm talking about! I've only been looking at this stuff for like two days! I guess I should just shut up and go back to reading Erlang for C, C++ and Java Programmers. Hmm, I wonder where the tutorial for Python programmers is ;)

Wednesday, September 06, 2006

Erlang: Erlang Style Concurrency

I'm lovin' this article: http://www.defmacro.org/ramblings/concurrency.html.

It meshes nicely with my earlier post here. Erlang is like stackless Python with coroutines in that it's a lightweight threading system built on top of asynchronous IO. However, it's different in that there is no data shared between threads (i.e. a shared heap protected by locks). Instead, to share data, you must use message passing. This part matches my earlier post. Of course, the benefit is that it's trivial to do distributed computing with such systems.

Anyway, Erlang has been around a long time and is incredibly robust. I'd love to get a chance to use it.