Thursday, February 28, 2008

Scheme: Procedures vs. Data

I'm going through Structure and Interpretation of Computer Programs right now. One thing Abelson really enjoys is blurring the distinction between procedures and data. Coming from other languages with higher-order functions, I'm comfortable with the idea. For instance, I understand map, filter, reduce, etc. However, he takes the idea much further. He's just as likely to put a bunch of procedures into an abstract data structure as he is to put data into an abstract data structure. Since procedures can carry with them a bit of data via closure, it all works out. Even he calls such procedures "objects".

He loves to mix up data and procedures, because he says it "doesn't really matter." Thinking about that, that's somewhat true and somewhat untrue. Clearly, procedures can be passed around as data, and that's fine. However, there is a difference.

If a procedure takes a value and returns a value, you can think of that procedure as data with blanks that haven't been filled in yet. That's fine. In that case, the distinction between procedures and data isn't that important.

However, if a procedure has side effects like IO, that's very different. For instance, data doesn't ever do stuff like deleting your hard disk. In fact, data isn't even apply-able, so you can't even call it in the first place like you can a procedure. Procedures are only the same as data as long as you never actually call them. As soon as you call them, they have the option of being very different.

Since I'm sure all the Haskell authors have studied Lisp, I can see why they make such a big deal about the io monad. A function that doesn't use the io monad can be treated as inert data. A function that does use the io monad is very different. Sure, you can still mix, match, combine, etc. such a function, but as soon as you apply it, it can change the state of the world. I think that's a real difference.

Monday, February 25, 2008

A Hybrid World of Open and Closed Source Software

Open Source Was a Lie

Okay, now that I have your attention, let me explain what I mean. Part of the promise of the Open Source movement was that it would produce higher quality software. Looking back, it's clear to see that this isn't always true. It turns out, there are many examples on both sides. Sometimes the open source option is better, and sometimes the closed source option is better. For instance, Apache is far more secure and robust than IIS. Linux is far more reliable than Windows 95. On the other hand, Photoshop is way nicer than the GIMP, and my Java friends have told me that Intelli J IDEA is nicer than Eclipse. I've even heard that Solaris still beats Linux when it comes to NFS robustness.

Stallman Was Right All Along (It's about the freedom, baby! Yeah!)

Stallman actually predicted the above. In the early days, he was pulling his hair out screaming, "It's not about the quality! It's about the freedom!" That's why I now put myself in the Free Software camp.

Unfortunately, at this point, I don't think a Stallman-esque utopia of Free Software is going to happen. Software is a business. There are many business models that work well with Free Software. However, there are many more business models that can work using a hybrid of open and closed source software. Therefore, it should be no surprise that this is what the industy has settled on.

Absolutes Don't Last

Absolutes always seem to mellow in the long term. Communism fell in the Soviet Union in 1991, but one could argue that capitalism fell in the United States back in 1929. The truth of the matter is that the US has never had pure capitalism. It has steadily become more a composite of capitalism and socialism since the time of Franklin Roosevelt.

Similarly, CISC vs. RISC used to be a hot debate. These days, it's an almost irrelevant distinction. What happened is that chips became RISC on the inside and CISC on the outside.

FOSS Stopped the Monopoly

In a sense, the biggest thing we have to thank Free and Open Source Software for is that it stopped the monopoly. To some degree Microsoft had already won the game. FOSS opened it up again.

At the risk of sounding melodramatic, I like to think of it as an epic war between two semi-autistic geniuses. Gates had conquered the major battle for shrink-wrap software. Stallman simply changed the rules of engagement for future encounters by creating the Free Software movement. Suddenly, small companies could hold their own without being crushed the minute Microsoft decided to enter a new market. After all, it's never a good idea to let your enemy be your supplier.

Of course, it remains to be seen how badly patents will hurt all those small companies. As far as I can tell, it's impossible to write any decently-sized software system without violating someone's patents. The bigger the competitor, the more likely they'll hold patents that your code violates.

FOSS Meant an End to the Nightmare

When I was in college, I didn't have access to the source code for my operating system (Windows 95) or a free compiler (I was using Borland's Turbo Pascal). When I learned C++, I was too poor to afford a C++ compiler. It was then that my professor gave me a Linux distro (Slackware, of course). It was a quasi-religious discovery for me. I was no longer limited by money, and I had access to the source code for everything. The only thing that limited me was my own mind. FOSS meant an end to the nightmare of living in a closed source world.

In a sense, I've never fully recovered from that. That is why I have such a hard time accepting software like OS X, TextMate, Perforce, Jira, etc. Don't get me wrong, I've heard that they're all great. However, to me, they represent a return to a nightmare that I thought I had left behind.

Innovating in a Hybrid World

These days, the companies that are doing the best are the ones that are innovating in a hybrid world. For instance, at the surface, LAMP stands for Linux, Apache, MySQL, and PHP. However, at a deeper level, LAMP represents a gold rush of companies using FOSS to build customer-facing, closed source Web sites in the hope of being overnight millionaires. In a few dozen cases, it's even worked.

Apple

One company that's doing really well in a hybrid world is Apple. OS X is a beautiful mix of open and closed source software. I think the FOSS movement played a large role in bringing them back from the dead. It provided a great base for them on which to innovate.

Frustratingly, much of what they did best (e.g. Cocoa and Aqua) remain closed source. In a hybrid world, that's a very reasonable business decision, but it's still frustrating for me. Of course, the little Stallman voice inside my head likes to remind me that this is nothing new. Cocoa is just another in a line of nice, but closed source GUI toolkits such as Motif and QT (although QT became GPL later).

I hate the way Apple treats its customers. However, what can you do? Fork the source code? In a hybrid world, you have access to a lot of their source code, but forking the whole thing is simply not an option. That's critical for Apple's success, of course. A hybrid world presents a mixed bag of pros and cons for all those involved.

Google (Linux for the win!)

Google is another company that's doing extremely well in a hybrid world. Compared to Apple, they've been even more helpful to the FOSS world. It's not just the big things such as Google Code (which includes Summer of Code and project hosting); it's also the little things such as the fact that it lets my local Python interest group host its meetings on campus. They also let Guido van Rossum, the author of Python, spend 50% of his time working on Python with no strings attached.

Like Apple, Google has also strategically withheld the source code for much of its "special sauce." I can understand why they're search algorithms are closed source, but I really do wish they'd open up their implementations of MapReduce, GFS, and BigTable. (To be fair, they did write white papers on them, which was a great help for the rest of the world.)

IBM, Oracle, and Sun

Transitioning to a hybrid world was great for IBM. After they botched the DOS deal with Microsoft and got their lunch eaten with OS/2, they focused on hardware, mainframes, and consulting. When it turned out that it took specialists to make Linux pay off, they stepped up to the plate saying, "Hey, we can provide consultants for that too!"

Oracle jumped onto the bandwagon as well. They embraced Linux, but I'm not under the delusion that it was based on philanthropy. They used Linux as a way to stab their partner Sun Microsystems in the back and try for a bigger piece of the pie.

As for Sun Microsystems, a whole book could be written about their love hate relationship with FOSS. Does anyone remember downloading Linux from SunSITE? They finally opened up Java after years of fighting against it, but the battle rages on. If you go to download OpenSolaris, you'll quickly find out that the "OpenSolaris project does not provide an end-user product or complete distribution." They open sourced a bunch of code, but not enough to run a complete system. I guess living in a hybrid world is what they've always wanted.

My Heroes

In trying to make sense of this whole mess, I've paid close attention to my heroes.

Guido van Rossum, the author of Python, and Alex Martelli, the author of two Python books, both carry Apple laptops.

Bram Moolenaar, the author of Vim, also totes an Apple. I really look up to Bram since Vim is charityware. (I am sympathetic to Bram's sympathy for orphans.) When I asked Bram about his laptop, he said something like, "Open Source is a lot of fun, but everyone has to make money, and closed source software is an easy way to do it."

More than half of the FreeBSD commiters I know (which is somewhere between 7-10) use Apple laptops or desktops these days. Perhaps this should come as no surprise. OS X uses a lot of FreeBSD code. Jordan Hubbard, who co-founded the FreeBSD project, is now the "Director of Engineering of Unix Technologies" at Apple. The little Stallman voice inside my head likes to remind me that the license used by the FreeBSD project is sympathetic to closed source derivatives.

Even Linus Torvalds, the author of Linux, wasn't against using a closed source revision control system, although he later decided to write his own.

Then there's my mentor Mike Cheponis. Mike is an old-school hacker from MIT. He also worked on Unix back when Unix was an actual OS and not just a "style" of OS. Having worked at Apple, he doesn't share my distaste for it. On the other hand, he agrees with me that behind closed doors Apple is just another Microsoft-wannabe. He carries an Apple laptop that also has Windows and NetBSD on it.

Last (but in no way least), there's Stallman. He still has not given up the dream. He started the Free Software movement as a result of the crushing loneliness he faced when all of his friends left to form Symbolics and excluded him. I met Stallman for the first time a couple months ago. I actually gave him a hug. Love him or hate him, it's impossible to disagree with the fact that he's made the software world a better place. Sometimes we need extremists to push us in the right direction. On a personal note, my only regret writing this is that he would be disappointed in me for conceding to a hybrid world. The little Stallman voice in my head likes to chide me saying, "But we've come so far! How can you give up now?"

Picking the Future

Okay, so we're not going to have a world of absolutes. What do we do about it?

Should we abandon the GPL for more commercial-friendly licenses such as the one used by the FreeBSD project? I don't think so. If a company wants to make use of and contribute to FOSS, that's great. However, if I want to write a library in my spare time, and I don't want anyone else producing closed source derivatives of my work, so be it. Linux and FreeBSD use very different licenses, yet they are both a benefit to the world. Anyone who works at a company and complains that they can't make derivative, closed source works of GPLed software should remember that they can't make any derivative works of proprietary software. A hybrid world is still better than a closed source world.

We'll need to choose what we insist on being Open Source and what we'll concede to using closed source. I strongly believe that any programming language that isn't Open Source isn't worth using. Similarly, it's quite unfortunate to write Open Source software that relies on closed source libraries. Let's also give three cheers for Firefox. Closed source software is one thing, but an entire Web that's only viewable using a closed source browser would be a tragedy indeed (which is the primary reason I don't like Flash).

On the other hand, I don't think it's the end of the world that TurboTax is closed source. Writing tax software has got to suck, and I have no wish whatsoever to ever work on it.

What about TextMate, which I mentioned earlier? From what I've heard, Allan Odgaard has done an excellent job in producing a beautiful, friendly, and powerful editor. I don't fault him in the least for wanting to make a living building such an editor. Nor do I fault anyone else for using it. However, for me personally, using a closed source editor 8-10 hours a day sounds like a nightmare. In general, programmers love writing editors; I know that I've contributed several patches to Vim. There's no reason for me to ever settle on a closed source editor when it's such a core part of my life as a coder. I'd sooner write my own from scratch.

Conclusion

What we have now and what we will continue to have is a hybrid world of open and closed source software. To those who think that there's no place for Open Source software in the commercial world: Good luck trying to take on Apple and Google! To those who think the Free Software movement will continue and conquer the world: I sympathize. Don't flame me for believing as I do. Go write more Free Software! After all:

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw

Scheme: Implementing cons, car, and cdr

I'm going through Structure and Interpretation of Computer Programs per the recommendation of my buddy Mike Cheponis. I'm really enjoying it.

I always thought cons, car, and cdr were so low-level that they had to be builtins. However, interestingly enough, SICP shows how they could be implemented using a closure:
(define (my-cons a b)
(lambda (pick)
(cond ((= pick 1) a)
((= pick 2) b))))

(define (my-car x) (x 1))
(define (my-cdr x) (x 2))
It's kind of silly, but it also works in Python:
def cons(a, b):

def list(pick):
if pick == 1:
return a
elif pick == 2:
return b
else:
raise ValueError

return list

def car(list):
return list(1)

def cdr(list):
return list(2)
Neat!

It's easy to see how to extend this in Scheme to have "objects" with any number of memebers, although I'm sure it's not very efficient.

By the way, I really like DrScheme. It's relatively modern and very friendly.

Saturday, February 16, 2008

Books: The Myths of Innovation

I just finished reading The Myths of Innovation. It's a short, enjoyable read, and I recommend it. I kept track of some of my favorite quotes:

By idolizing those whom we honor we do a disservice both to them and to ourselves...we fail to recognize that we could go and do likewise. -- Charles V. Willie [p. 1]

Freeman Dyson, a world-class physicist and author, agrees, "I think it's very important to be idle...people who keep themselves busy all the time are generally not creative." [p. 12]

As Howard Aiken, a famous inventor, said, "Don't worry about people stealing an idea. If it's original, you will have to ram it down their throats." [p. 59]

As William Gibson wrote, "The future is here. It's just not widely distributed yet." [p. 66]

[Alex F. Osborn wrote about finding ideas:]
  • Produce as many ideas as possible
  • Produce ideas as wild as possible
  • Build upon each other's ideas
  • Avoid passing judgment. [p. 92]
Jobs explains, "I'm convinced that about half of what separates the successful entrepreneurs from the non-successful ones is pure perseverance." [p. 107]

Even the (false) proverbial mousetrap, as historian John H. Lienhard notes, has about 400 patents for new designs filed annually in the U.S., and we can be certain that no one is beating down their doors. More than 4000 mousetrap patents exist, yet only around 20 ever became profitable products. These days, the best equivalent to the metaphoric mousetrap is "to build a better web site," proven by the 30,000 software patents and 1 million web sites created annually. Certainly not all of these efforts are motivated by wealth or wishful thinking, but many inventors still hope that the "If you build it, they will come" sentiment is alive and strong. [p. 113]

DDT and airplanes were a perfect match. Here [image of plane spraying DDT], DDT is being used on cattle to give them extra special flavor. [p. 142]

Automobiles speed the police to crime scenes, but they also help thieves get away. The rising tide of technology raises all boats. [p. 146]

The best philosophy of innovation is to accept both change and tradition and to avoid the traps of absolutes. As ridiculous as it is to accept all new ideas simply because they're new, it's equally silly to accept all traditions simply because they're traditions. Ideas new and old have their place in the future, and it's our job to put them there. [p. 147]

Computer Science: Offset-based Linked Lists

Note, I'm speaking a little loosely, and I'm not a C expert, but I think this post is still interesting nonetheless.

There are many ways to "architect" and implement lists.

These days, it's the norm in languages like Java, Python, etc. for the language or language library itself to provide a list implementation. You simply create a list and then start shoving objects into it. Python, Ruby, Perl and PHP all provide native syntax for lists. The algorithm used is more sophisticated than a singly or doubly-linked list because lists must double as arrays in those languages. Java and C++ have native arrays and array syntax, but they also provide a variety of list implementations in their libraries. Both can use generics to constrain the type of objects you put in those lists. One thing to note, however, is that there's a distinction between the list itself and the items in the list.

Since C's standard library doesn't provide a list implementation, a variety of approaches are used. It's not uncommon to create structs that have prev and next pointers in them and to manage the list manually. In general, it seems common in C to create data structures manually and to simply manage the data structure as part of the overall programming task. This is in stark contrast to, say, Python where the list implementation is in C and the code using the list is in Python.

My buddy Kelly Yancey once showed me that FreeBSD had macros so that programmers wouldn't have to keep reimplementing linked lists all the time in the kernel. I think macros were used instead of functions so that the code behaved as if it were actually written inline, thus avoiding the function call overhead--but I could be wrong.

At Metaweb, I had a buddy named Scott Meyer who use to work at Oracle. He showed me a pretty interesting trick for creating linked lists. As before, he had structs which contained application data as well as next and prev pointers for managing the structs in a linked list. However, rather than manipulate those linked lists manually, he wrote separate, generic linked list functions. The question is: how can a generic linked list function operate on random structs (i.e. void *'s) that just happen to have prev and next fields somewhere within them?

For each type of struct, he would create a "descriptor" struct that contained the offset of the prev and next fields. He would pass the descriptor to the function anytime he needed to manipulate the linked list. Loosely speaking, it's like saying, "Hey, I got this linked list, and I want you to insert a new member into it. You might not know anything about the type of structs in the linked list, but I can tell you that the prev pointer is 16 bytes from the beginning of the struct, and the next pointer is 20 bytes from the beginning."

I think the lesson is deep. If you want to write a function that operates generically on structs that you know have certain fields, you just have to tell that function the offset of those fields. Naturally, you don't want to just pass the offsets. Rather, you pack them up into a struct. Creating such a struct is like declaring that you implement some interface. I had seen how structs full of function pointers doubled as interfaces in Apache, but structs full of offsets was something entirely new for me.

Neat!

Personal: Intentionally Unemployed

Yesterday was my last day at Metaweb. Metaweb's a great company, and I enjoyed my three month stay there, but after finishing the work I was working on, I knew it was time to take a break.

I've worked at five startups in a row. When I left IronPort, I wanted to "see the world" and see new ways of writing software. I've had a good time, but it's taken its toll on me. Six months ago, I was learning Ruby on Rails, doing a Facebook startup using it, and trying to take care of my family as my wife gave birth to our fourth child. You can see why I'm kind of burnt out. I'm hoping to take a month or so off.

I've now been doing Python Web development on Linux (on and off) for about seven years. When I started, there were few Python jobs, and I was the only one at my company who insisted on using Linux for his desktop. There was no OS X. Everyone used Windows. I was coding in PHP working with the author of the first book (which had just come out) on PHP, Leon Atkinson. It's amazing how the world has changed.

I have no clue what I'll do next. I'd like to do something new. Unfortunately, looking at Craigslist, it seems like the industry is a bit slow right now, and there doesn't seem to be much new going on. Hopefully, I'll find something new and interesting. Today, I'm hanging out at SuperHappyDevHouse in order to recharge my batteries and get excited about programing again.

Happy Hacking!

Monday, February 04, 2008

Concurrency and Python

I wrote an article for Dr. Dobb's Journal: Concurrency and Python.

Abstract: Stackless Python, Erlang, and greenlets are interesting approaches to concurrency.

Enjoy!