Thursday, May 22, 2008

Rant: UNIX vs. the Web

For all its strengths, developing for the Web has become a gigantic pain in the rear, especially when compared to the Unix style of development.

A few months ago, I joined a new startup. Almost all of it is backend processing that doesn't even use a database until almost all the work is done. Our only Web application is for a Web services API. Since I was mostly writing the code from scratch, my boss and I agreed that taking a Unix approach was best. Hence, we have a bunch of simple, standalone tools. Writing such tools is so refreshing. You know exactly what you need to do. They're only a few hundred lines long. You build a nice command line interface using optparse, you write some tests using nose, etc. It's all very straightforward and linear. Wanna know how to do a UNIX-style mashup? You use a pipe.

Recently, I went to work on a Web application again, and I realized just how much of a giant pain in the rear it is. Here are some of the things you need to think about. You need to be fluent in Python (or PHP, Ruby, etc.), HTML, CSS, SQL, and JavaScript. You have to have a Web server, a database, and possibly other auxiliary items like a load balancer. You have to think of the front end and the back end. Usually, you'll use a Web framework, a templating engine, and possibly an ORM. They're all probably young projects, so you'll have to be on the mailing list for each. Despite all the time you'll invest in these projects, you probably won't be using them still in five years.

Besides implementation concerns, the Web is simply complex these days. Have you read all the papers on session fixation attacks, cross site scripting vulnerabilities, SQL injection attacks, and cross site forgery attacks? Do you remember the HTTP response codes for SEE_OTHER and TEMPORARY_REDIRECT? Do you know when you should use each? I see tons of books about various Web frameworks and libraries, but where is there a really good book on how to be good at plain old Web development?

How do you deal with logins? OpenID hasn't really taken off yet, and not everyone can depend on Facebook for authentication. That means you'll need account services. What if the user forgets his password? Did you know that if you URL encode something that's been base64 encoded and then send it in an email, it might not make it through Hotmail in all cases? However, if you make the link too long, users will get confused when their email client breaks it into two lines.

The browser is now one of the most complicated pieces of software on a standard desktop. Everyone knows the DOM is a mess. How do you respond to events? There are at least three ways, and they're all painful for various reasons unless you're using a JavaScript framework. XHR, which is now a staple in the Web world, started life as a Microsoft hack. innerHTML is another such hack. Yet as useful and convenient as it is, it hasn't been blessed by the standards bodies. Seriously, who the heck thought of createNode, setAttribute, etc. in order to inject some HTML? It's very Java-ish and not very JavaScript-ish at all.

By the way, concerning standards bodies--you know, the ones who spent so much time creating XHTML?--I'll remind you that the Mozilla Web Author FAQ still says that "Serving valid HTML 4.01 as text/html ensures the widest browser and search engine support." I.e., use HTML not XHTML. The Webkit guys say pretty much the same thing.

But the fun doesn't end there. Need to do a Web request against a foreign domain and JavaScript won't let you? There's a workaround for that too. You just use a script tag and JSONP.

The Web is a strange place where the HACKS become the standard by which you get stuff done.

Once you get past all those difficulties, you're still stuck with something that's still not as snappy as a desktop app and a lot more difficult to code than an old school (i.e. no JavaScript) Web app. Essentially, the tough thing about Web apps is that they're large and effectively connectionless. Unless you're using something like Seaside, you have to handle each new Web request completely from scratch.
Hello, who are you? Oh, you have a cookie? Let me see if I know anything about you. Oh, the memcache server says it has a session for you. Let me talk to the database to see if he can tell me more. Ok, here's a form. Get back to me when you're ready.

Hello, who are you? Oh, you have a cookie? Let me see if I know anything about you...
And let's not forget that you're simultaneously carrying on about a hundred such conversations at any given time. You have all the drawbacks of multithreaded coding except, you really can't count on anything being in memory because you're spread across several servers. It's the worst of both worlds.

The nice thing about UNIX tools is that once you get them working, you don't need to think much about them anymore. When was the last time you worried much about cut, uniq, or sort? On the other hand, with a Web app, plan on rewriting it five years from now. Oh, and it'll be even harder and more messed up by then.

Of course, all my complaints just don't matter because the Web has too many good properties. It's vendor and OS neutral. You can run millions of different applications, and the only thing you need to download is a browser. (Oh wait, you already have one? An ancient version of IE? No worries, we can support that too!)

Yet again, we are reminded that worse is better. Apparently, much, much worse is also much, much better.

14 comments:

Shannon -jj Behrens said...

The thing I love most about JavaScript is that there's always more to learn. There's always some feature that you need to learn about fully so that you'll know not to use it.

darin said...

The trouble began when we decided to write stateful applications on top of a stateless protocol. There was nowhere to go but down. ;)

Jim said...

You aren't really comparing like and like though, are you?

What do you do about authentication in UNIX? What if a user forgets their password then? You still need to handle it, it's not like it's something unique to the web.

You need to know JavaScript for the web? Well you need to know Bourne for UNIX. And sed. And awk. And grep.

Do I remember the HTTP response codes for See Other and Temporary Redirect? How is that different to memorising the signals to make a daemon re-read its config? Is See Other any more difficult to remember than SIGHUP?

Securing yourself against session fixation attacks? Well if you're coding in UNIX, I hope you pay attention to how you handle symlinks and temp files.

SQL injection attacks? If you're coding in UNIX, I hope you're paying attention to your shell variables and quoting them properly.

As for:

> The Web is a strange place where the HACKS become the standard by which you get stuff done.

Isn't UNIX the very system Worse is Better was written about?

I completely agree that there is a hell of a lot to remember when web programming. But I think you're blinded by familiarity to the equivalent complexity present in UNIX applications.

The simple fact of the matter is that the complexity of developing a solution to a problem is primarily caused by the complexity of the problem rather than the platform. Some platforms make it easier than others, sure, but the web isn't significantly different to UNIX in this respect. Your article only makes it seem that way because it highlights all the complexity in the web while ignoring the equivalents in UNIX.

Shannon -jj Behrens said...

> You aren't really comparing like and like though, are you?

Loosely. I'm comparing what it "feels like" to write a Web app compared to what it "feels like" to develop UNIXy apps.

> What do you do about authentication in UNIX? What if a user forgets their password then? You still need to handle it, it's not like it's something unique to the web.

You use PAM. Every app on the system can make use of PAM.

> You need to know JavaScript for the web? Well you need to know Bourne for UNIX. And sed. And awk. And grep.

Yes, but they aren't very hard. They're all pretty straightforward. I learned sed and awk plenty well by simply reading a chapter on each in a UNIX book.

> Do I remember the HTTP response codes for See Other and Temporary Redirect? How is that different to memorising the signals to make a daemon re-read its config?

HUP. They all use HUP.

> Is See Other any more difficult to remember than SIGHUP?

It's strange. Everyone knows about HUP, but so few Web developers know about the proper use of SEE_OTHER vs. TEMPORARY_REDIRECT.

> Securing yourself against session fixation attacks? Well if you're coding in UNIX, I hope you pay attention to how you handle symlinks and temp files.

Good point, although those are easily fixable by using the right library. You can't prevent cross site request forgery attacks as easily. Your argument would be better if you said, "Remember how bad buffer overflow vulnerabilities were in C?"

> SQL injection attacks? If you're coding in UNIX, I hope you're paying attention to your shell variables and quoting them properly.

Yes, SQL injection attacks and shelling out are both easy to get right if you use the right function. I.e., don't use string interpolation.

> As for:

> > The Web is a strange place where the HACKS become the standard by which you get stuff done.

> Isn't UNIX the very system Worse is Better was written about?

Yes, it is. That's why I said, "Far worse is far better." It was a joke.

> I completely agree that there is a hell of a lot to remember when web programming. But I think you're blinded by familiarity to the equivalent complexity present in UNIX applications.

Maybe, although I'm pretty familiar with both. The fact of the matter is, we're operating with much, much more code these days. There are more pieces and more APIs.

> The simple fact of the matter is that the complexity of developing a solution to a problem is primarily caused by the complexity of the problem rather than the platform.

I wish. Coding in C is easier than coding in assembly because you don't need to think of quite so many details. The same is true of C vs. Python. Web programming requires us to think of too many details at too many layers these days.

> Some platforms make it easier than others, sure, but the web isn't significantly different to UNIX in this respect. Your article only makes it seem that way because it highlights all the complexity in the web while ignoring the equivalents in UNIX.

By the way, this rant was meant to be humorous. Thanks for reading ;)

Darrin Eden said...

Awesome rant! I wonder the ratio people using http to pipe? 100K to 1?

Shannon -jj Behrens said...

> Awesome rant! I wonder the ratio people using http to pipe? 100K to 1?

haha

Thanks, Darrin.

Jim said...

> You use PAM. Every app on the system can make use of PAM.

So if you propose using a library to handle the messy details of forgotten passwords and the like for UNIX, why can't web developers use a library to handle the messy details too?

> > You need to know JavaScript for the web? Well you need to know Bourne for UNIX. And sed. And awk. And grep.

> Yes, but they aren't very hard. They're all pretty straightforward. I learned sed and awk plenty well by simply reading a chapter on each in a UNIX book.

JavaScript isn't very hard. I learned JavaScript by imitating others, no book necessary.

> > Do I remember the HTTP response codes for See Other and Temporary Redirect? How is that different to memorising the signals to make a daemon re-read its config?

> HUP. They all use HUP.

And all browsers use the same response codes to mean the same thing too. Send a See Other to a browser and they all do the same thing. There are other response codes that do other things that you also need to memorise for the web, but there are other signals that do other things that you also need to memorise in UNIX too.

> It's strange. Everyone knows about HUP, but so few Web developers know about the proper use of SEE_OTHER vs. TEMPORARY_REDIRECT.

Not strange at all. There's a historical accident that made Found ambiguous, but that's all the legacy browsers supported when See Other and Temporary Redirect were invented, so everybody kept using it.

There are masses upon masses of similar historical accidents for UNIX too. The arguments to ps always get me.

> Good point, although those are easily fixable by using the right library. You can't prevent cross site request forgery attacks as easily.

Sure you can.

http://www.djangoproject.com/documentation/csrf/

> SQL injection attacks and shelling out are both easy to get right if you use the right function.

Right. So pointing out that the web is more difficult than UNIX because of things like SQL injections, while ignoring shelling out isn't really on, is it?

> Web programming requires us to think of too many details at too many layers these days.

For perfectionists, sure. A lot of those layers can be ignored. Use an ORM, use a CMS, etc. Just like you can use something like curses to abstract terminal handling away, you can use something like GWT to abstract details of the web away. Leaky abstractions, sure, but UNIX is just as leaky.

Stephan Eggermont said...

Just switch to smalltalk, seaside and gemstone. Then you only have to worry about javascript.

Shannon -jj Behrens said...

> Just switch to smalltalk, seaside and gemstone. Then you only have to worry about javascript.

Seaside has been on my mind a lot lately. I wish I had someone to talk with about it.

Adam U. said...

Great rant, I remember having a similar chat with you about this topic like 6 or 7 years ago :-) Funny how things come around again.

Funny thing, I was a web developer once, but nothing I wrote is on the web anymore. However, crappy little Unix tools and integration scripts I wrote are still alive, often to my surprise, occasionally to my chagrin.

The biggest difference between Unix and the Web is not the learning curve, but the length of time the knowledge is useful.

For a while I was doing nothing but integration work with a Cobol-based ERP. Essentially, Cobol/Perl/sed/awk/sh all day. I could've been zapped 10 years into the future or 10 years into the past and my skills would've been just relevant. Can't say the same thing with web development.. unless you know someone seeking a ColdFusion 4.5 developer ;-)

Anonymous said...

It's hopeless.

It's too easy to throw something together, which "sortta" works, but has no conceptual integrity.

I think another problem is that Open Sores software is, by definition, AT BEST, a Beta, more typically an Alpha. So, it should not be surprising if the quality is, ahh, not as high as it might be.

Like most stuff that gets buzz, Django "sortta" works. Works well enough that it has a user base.

I keep waiting for the Grand Unification of Pylons and Gears - maybe soon enough...

In the meantime, as I suffer with JavaScript, CSS, DHTML, the DOM, XML, and all that _shit_ - well, I'm ready to have heated discussions with every Netscape employee who had anything to do with this disaster.

I'm tempted to blame the Stanfords, CMUs, and MITs for this current mess, because they didn't step up and produce leadership - (CLEARLY, none of their students or graduates were involved in this web disaster!).

So we are forced to deal with the poop of The Inexperienced - and that is NEVER fun!

If people had their wits about them when web standards were being solidified, they would have chosen Lisp as the browser's language, and they would have chosen S-expressions instead of XML/JSON (and, by default, a UNIFIED DOM - what a concept!)

It's enough to make me heave... Something as simple as looking up something in a database, re-formatting into JSON, and getting the client to actually receive it so it can display it requires an absolutely humongous amount of code/servers/etc - just for that!

The Web standards feel like they've been invented by high school students on crack - certainly, no Adult Supervision was involved.

The other thing Web Standards do is help foster the Web Programmer's Full Employment Program - because so much of what a Web developer has to do is deal with trivial mind-numbing bullshit.

The other thing is that the most trivial of applications seem to need these _huge_ Systems Approaches - it's like, nobody ever heard about how to build small, efficient, quickly-written systems.

Apparently, while I was building h/w in the 90s and early part of the 2000s, the s/w world went to hell, and I wasn't paying enough attention to save it. I _deeply_ apologize.

-Mike

Anonymous said...

I must agree about the difference between web development and UNIX except for one thing - you say that a web interface will have to be rewritten in five years - it is more likely two years (or five months).

sampablokuper said...

I don't know if you've seen NetKernel. In what I think is rather a nice way, it somewhat muddies the distinction between UNIX and the Web :)

Shannon -jj Behrens said...

> NetKernel

Weird!