Monday, April 18, 2011

Ruby: nil is a Billion Dollar Mistake

The fact is, I really like Ruby. However, there are some ways in which it uses nil that I really disagree with. For instance, in Ruby, if you try to look up something in a hash that doesn't exist, you get a nil. Similarly, if you try to reference an @attribute that hasn't been set yet, you'll get a nil.

That reminds me of this article, Null References: The Billion Dollar Mistake:
Tony Hoare introduced Null references in ALGOL W back in 1965 “simply because it was so easy to implement”, says Mr. Hoare. He talks about that decision considering it “my billion-dollar mistake”.
Compounding this problem is Ruby's current lack of real keyword arguments (although, I know they're coming). Hence, if you pass a keyword argument like f(:foo => 1), and then try to use the keyword argument in the function like options[:foooo], the misspelling will result in a nil as if the argument hadn't been passed. This masks a real problem. All of these have resulted in real bugs in my code and lots of frustration.

It doesn't have to be this way. In Python, if you try to look up something in a dict that doesn't exist, you get a KeyError exception. If you try to use an attribute that doesn't exist, you get an AttributeError exception. If you misspell a keyword argument, you get a TypeError exception. Exceptions are nice because they catch the problem right away instead of allowing it to fester. In Ruby, if I get a nil that I'm not expecting, I might not find out that there's a problem until much, much later in a different part of the code when I try to call a method on the nil thinking it's a real object.

Since I'm on the subject of hashes, I think I should mention that Haskell and Scala take a different approach. If you look up something in Scala that may not exist, it returns an Option instance. An Option instance may or may not contain a real value. (Similarly, Haskell uses the Maybe monad.) You have to do work to get the value out of the Option instance, and the type system will catch you if you just blindly assume that there's always something in the Option instance. This is one case where the ML type system can help you avoid a whole class of bugs.

38 comments:

pzol said...

If you use the fetch method in a hash you can get an exception

Sean Huber said...

I suppose if you really wanted to you could implement that kind of behavior (I love that ruby is flexible enough to do this):

Hash#fetch raises KeyError like you expected - you could also Hash.send(:alias_method, :[], :fetch) if you want to make that the default behavior.

Similar behavior can be implemented for instance variables as well:

Object.class_eval do
  def instance_variable_fetch(symbol)
    if instance_variable_defined?(symbol)
      instance_variable_get(symbol)
    else
      raise KeyError.new(symbol)
    end
  end
end

Shailen Tuli said...

You can force an exception in Ruby with fetch() and you can prevent the exception in Python with get(). Both languages get this right, I think, although the default behavior is different.

Also, you can just check to see if a key exists before referencing it, right?

Mark Wilden said...

If this behavior bites you, that just means you haven't written the right tests. In general, Ruby tries to provide reasonable defaults rather than protect the programmer from himself.

Anthony said...

I have to agree with Mark, this is really an issue only if you have insufficient test coverage.

Edward J. St said...

The Ruby-esque pattern would be two similar functions, one returns nil, the other with a bang suffix would raise the error.

For example:

hsh[key] -> value or nil
hsh[key]! -> value or KeyError

Shannon -jj Behrens said...

> If you use the fetch method in a hash you can get an exception

Thanks. That's helpful.

Shannon -jj Behrens said...

> I suppose if you really wanted to you could implement that kind of behavior...

Nice trick ;) I was thinking that you could do that for :fetch, but I didn't know you could do it for instance_variable_fetch. I think it makes sense for someone to wrap all of these into a nice library called "strict", and then people could use that library on their application. (That reminds me of ActiveSupport in Rails.)

Shannon -jj Behrens said...

> You can force an exception in Ruby with fetch() and you can prevent the exception in Python with get(). Both languages get this right, I think, although the default behavior is different.

I think the default should be to raise an exception, of course.

> Also, you can just check to see if a key exists before referencing it, right?

That results in more code and two hash lookups. Furthermore, you might not aways remember to do it.

Shannon -jj Behrens said...

> If this behavior bites you, that just means you haven't written the right tests. In general, Ruby tries to provide reasonable defaults rather than protect the programmer from himself.

I'm saying that Ruby's defaults are not correct (nil is a billion dollar mistake).

Protecting the programmer from himself is very good--that's why we don't do our own memory management these days.

Certainly you need tests, but having a test that shows you that a piece of code does work is no substitute for a runtime that raises an exception when something isn't right. For instance, even when I write lots of tests, I still use assertions.

Shannon -jj Behrens said...

> I have to agree with Mark, this is really an issue only if you have insufficient test coverage.

When a test fails, would you prefer it to fail because you tried to call some method on nil (leaving you to wonder why you ended up with a nil in the first place), or would you prefer to get an exception at the point the problem originated?

I have very good test coverage, yet it still drives me crazy when my tests fail for reasons that take me a long time to diagnose.

Shannon -jj Behrens said...

hsh[key]! -> value or KeyError

Is there such a thing as "[]!"?

Mark Wilden said...

No, dynamically-typed languages are explicitly against the concept of protecting the programmer from himself. If you want protection use C#.

Ruby is based on making programming fun (ask its inventor). In many places, this means providing reasonable defaults, so the programmer doesn't have to worry so much about whether, e.g., an instance variable or hash member or array member is initialized.

That doesn't mean it's "correct" - just that it's part of its philosophy. Maybe Ruby just isn't the right language for you?

I ask because you say you use lots of assertions, which, while well established in the C/C++/C# world, just isn't generally done with Ruby.

We spend that time writing clear code that doesn't lumber under the weight of type-checking and assertions. Instead, we write tests.

Shannon -jj Behrens said...

> No, dynamically-typed languages are explicitly against the concept of protecting the programmer from himself.

I'm going to have to disagree. In C, not only can you shoot yourself in your foot with buffer overflows, but you can arbitrarily point to random locations in memory, and you can forcibly cast a pointer to the wrong type. These are all things that Ruby doesn't allow you to do.

> If you want protection use C#.

Gees, you don't have to get nasty!

(Although, I've heard that C# is a nice language that adds a pinch of Haskell to Java.)

> Ruby is based on making programming fun (ask its inventor).

There's nothing about fighting a "unsupported method called on nil" bug that is particularly more fun than a KeyError exception.

Furthermore, about half the programming books out there on any programming subject promise that the subject they teach is somehow going to be more fun than normal programming.

> Maybe Ruby just isn't the right language for you?

Just because I disagree with some core philosophy of a language doesn't mean I should be kicked out of the community. There are things I disagree with in every programming language I use.

> I ask because you say you use lots of assertions, which, while well established in the C/C++/C# world, just isn't generally done with Ruby.

There's a reason I use assertions. A few years ago, Microsoft research published a paper with empirical data stating that programmers who used lots of assertions tended to have fewer defects than programmers who achieved 100% code coverage via heavy use of mocking and stubbing.

http://www.infoq.com/news/2009/10/exploding-myths

Sebas said...

Like with fetch, you can get a KeyError initializing the hash with a block raise KeyError

h = Hash.new { raise KeyError }

h[:foo] = 'bar'

puts h[:foo]
puts h[:foooo]

Mark Wilden said...

I used to use C# (and C++ and C). It's a fine language. It just has a different philosophy behind it than Ruby. Matz explicitly said that Ruby was to make programming fun. Ritchie, Stroustrup and Hejslberg had no such goals for their languages.

WRT shooting yourself in the foot, I was talking about strongly typing vs. weak typing. The goal of strong typing is to make programmers fix errors and to make programs run faster. Neither of these are Ruby's goals.

I didn't read the same conclusions from the Microsoft studies that you did. They were separate studies. One said that assertions catch bugs (certainly true). Another said that 100% code coverage did not catch all bugs (also true). The studies did not compare these groups.

Shannon -jj Behrens said...

> Like with fetch, you can get a KeyError initializing the hash with a block raise KeyError

Nice tip! Too bad that doesn't help with the options parameter when you call a function like "f(:a => 1)".

Shannon -jj Behrens said...

> I used to use C# (and C++ and C). It's a fine language. It just has a different philosophy behind it than Ruby. Matz explicitly said that Ruby was to make programming fun. Ritchie, Stroustrup and Hejslberg had no such goals for their languages.

You're right.

> WRT shooting yourself in the foot, I was talking about strongly typing vs. weak typing. The goal of strong typing is to make programmers fix errors and to make programs run faster. Neither of these are Ruby's goals.

I agree with your point, but I dislike the term "weakly typed". K&R C is weakly typed compared to ANSI C. Ruby is strongly typed in that you can't treat a reference to a foo as if it were a reference to a bar. Hence, I prefer the terms "dynamically typed", "duck typed", or "latently typed".

(A C programmer might think that treating a pointer to a booger as a an int is a good idea, but it's snot!)

> I didn't read the same conclusions from the Microsoft studies that you did. They were separate studies. One said that assertions catch bugs (certainly true). Another said that 100% code coverage did not catch all bugs (also true). The studies did not compare these groups.

Excellent point.

Mark Wilden said...

> I dislike the term "weakly typed". K&R C is weakly typed compared to ANSI C. Ruby is strongly typed in that you can't treat a reference to a foo as if it were a reference to a bar. Hence, I prefer the terms "dynamically typed", "duck typed", or "latently typed".

You are correct, sir. :)

Martin said...

I agree so much with your article. Most of the time programmers use the [] operator to fetch a value, when in fact they should use fetch. The problem here is that programmers don"t know their standard library well. If it's generalized, I think that the API has a problem.

On top of that the bracket operator returns nil when the key is not found. It also returns nil when the value of a matching key is set to nil. So it has 2 semantic, which I think is wrong.

Mark when you say that "Ruby is based on making programming fun". I think the undefined method for nil:NilClass, is the exception that I lose more my time on. For me this is not fun at all. And most of the time it's not caused by me but the gems creators.

For me if you have to fail, fail fast.

Shannon -jj Behrens said...

Thanks, Martin. You said it better than I did.

Dmytrii Nagirniak said...

And please do not remove my comment or otherwise explain the reason why you do that.

Shannon -jj Behrens said...

> And please do not remove my comment or otherwise explain the reason why you do that.

If you wish to leave a comment, please do so in a polite manner. I enjoy discussing differences of opinion, but messages that seem more inflammatory than useful will be deleted.

Aleksey Gureiev said...

The title of the post is way bolder than it should IMO. A billion dollar one? Really? :)

The whole ecosystem of Ruby is nil-centric if you will. The fact that you don't define class / instance variables upfront makes all the difference. A variable with any name can be defined at any moment and that's the beauty. Having those ugly existence checks all over the place would, if nothing else, make the code completely unreadable. If you need to check it specifically, go ahead -- there are several statements for that, but in most cases nil or not nil is all you need.

In the case with f(:a => 1), I personally don't care if it's f() or f(:a => nil). As simple as that. If there's a case when I need to account for nil's as valid values (which is extremely rare if you think about it), I can check for the key existence specifically.

I believe it's the case when old habits are getting in a way. They aren't necessarily bad. They just don't apply everywhere.

As for kicking anyone out from community, well, no one said that. All Mark meant is that why torture yourself if it's so against the hair growth. There should a reason why you decided to make this post (and port it on Ruby Reflector). Did you really expect that everyone would agree with you and say "oh, right, how did we live all these years"? When you post insights like this, you have to expect some argument and smirks, while 99.99% of the world will simply ignore.

Cheers.

Dmytrii Nagirniak said...

@Aleksey, so to the point. Can't agree more more.

Shannon -jj Behrens said...

> The title of the post is way bolder than it should IMO. A billion dollar one? Really? :)

The guy who invented the concept of Null wrote an article called "Null References: The Billion Dollar Mistake". My title is an allusion to that.

> The whole ecosystem of Ruby is nil-centric if you will.

Python and Ruby are very similar languages. Python uses exceptions in a lot of places where Ruby uses nil. There are a lot of things I like about Python and a lot of things I like about Ruby. However, the way Python prefers exceptions over nils is one place where I prefer the Python way.

> The fact that you don't define class / instance variables upfront makes all the difference.

That's true of Python too. However, if you try to access self.a before it's set, you get an exception rather than a nil. You still don't have to make any declarations.

> A variable with any name can be defined at any moment and that's the beauty.

Same in Python.

Shannon -jj Behrens said...

> Having those ugly existence checks all over the place would, if nothing else, make the code completely unreadable.

I don't know what ugly existence checks you're referring to. When I write "self.a" in Python, I don't have an ugly existence checks. Python will catch me, though, if I try to write "self.a" before I set "self.a".

> If you need to check it specifically, go ahead -- there are several statements for that, but in most cases nil or not nil is all you need.

Actually, if I try to access an instance variable that isn't set, I'd prefer the interpreter to catch me and raise an exception. That way I can correct my code. That's extra work for the interpreter, but it's not extra code for me.

> In the case with f(:a => 1), I personally don't care if it's f() or f(:a => nil).

Actually, my complaint about keyword arguments is that if you misspell one, you don't get an exception. In Python, you do. Misspellings have resulted in bugs for me, and they're hard to find. If Python raises an exception because I misspelled a keyword argument, it's trivial to spot and trivial to fix. To be fair, I think that my point may be slightly off topic, and I know it's going to be fixed in Ruby before too long.

> As simple as that. If there's a case when I need to account for nil's as valid values (which is extremely rare if you think about it), I can check for the key existence specifically.

> I believe it's the case when old habits are getting in a way. They aren't necessarily bad. They just don't apply everywhere.

I code in many languages. Notice that I referred to how Python, Ruby, Haskell, and Scala solve the problem of looking up something in a hash that doesn't exist? I think Haskell and Scala's approaches are the most modern. I certainly wouldn't call that an "old habit". Why is it that no one has commented on the Haskell and Scala approach that I wrote about?

Shannon -jj Behrens said...

> As for kicking anyone out from community, well, no one said that.

I was referring to the comment "Maybe Ruby just isn't the right language for you?"

> All Mark meant is that why torture yourself if it's so against the hair growth.

I'm a language guy. I code in a lot of languages. If one language has a better technique for a particular problem, I think it's useful to point it out. Personally, I think the way Scala (using Option) and Haskell (using Maybe) approach the problem is fascinating.

> There should a reason why you decided to make this post (and port it on Ruby Reflector).

I write about both Ruby and Python on this blog. Most of my readers are Python programmers. I don't even know what Ruby Reflector is, and I didn't know this blog post was on it.

> Did you really expect that everyone would agree with you and say "oh, right, how did we live all these years"?

No, I suspected that there would be push back, but I felt like I had to make my point anyway. I'm not actually trying to pick a fight. Notice that the very first sentence in the post was "The fact is, I really like Ruby."

I'm trying to pick a feature and write intelligently about how different languages handle it. However, it seems that many people aren't really addressing what I really said.

I said that the guy who invented the concept of null said it was a bad idea. I thought that that was a pretty interesting thing.

I'm saying that Scala and Haskell have better approaches. I thought that those two things would be useful insights.

> When you post insights like this, you have to expect some argument and smirks, while 99.99% of the world will simply ignore.

Unfortunately, that's true. However, some of the comments were really helpful. For instance, Sean Huber showed me that it might be possible to write a gem that changes the behavior of Ruby for all the cases I care about. That reminds me of "use strict;" in Perl. It wasn't there originally. I was hoping that my blog post might instigate the creation of a "use strict" for Ruby.

Shannon -jj Behrens said...

Dmytrii Nagirniak wrote:

The problem you are describing is imaginary and not backed by real world scenarios.

I came to Ruby from .NET and it has always been a pain to handle non existing keys.
I waste 3 lines of code to check for existence of the keys.

99% of the time I DID need NIL instead of exception.
In Ruby the 1% can be handled using fetch method.

Shannon -jj Behrens said...

> The problem you are describing is imaginary and not backed by real world scenarios.

I've spent the last 1.5 years as the lead engineer at a startup using Ruby on Rails. We built fandor.com. During that 1.5 years, I often encountered exceptions that looked like "NoMethodError: undefined method `foo' for nil:NilClass". It always leaves me wondering, how did that nil get there? That's certainly a "real world scenario" as far as I'm concerned because it happened to me in the real world ;)

The problem is that I have a nil somewhere where I didn't expect it. I would have preferred Ruby to raise an exception at the source of the problem (maybe when I looked up something in a hash) rather than later (when I try to use the thing I looked up).

> I came to Ruby from .NET and it has always been a pain to handle non existing keys.
I waste 3 lines of code to check for existence of the keys.

Sorry, I don't know how .NET handles this. I do know how Python handles it, and it doesn't require 3 lines of code. I just write d["foo"]. If "foo" doesn't exist, I'll get an exception. If I want Ruby's behavior, I write d.get("foo"). The default is to be strict. That's all I'm arguing for--I want the default to be strict.

> 99% of the time I DID need NIL instead of exception.

I think we simply disagree on the "99%". Most of the time, I want the system to be strict. Once in a while, I want it to be lenient; in those cases, I'd prefer to explicitly tell it I want it to be lenient.

> In Ruby the 1% can be handled using fetch method.

I'm just disagreeing with the default. I think that it's particularly the case with instance variables. If I write @a, and @a isn't set yet, it's very likely a bug, and I want Ruby to raise an exception.

If I write "self.a" in Python when "self.a" hasn't be set, I get an AttributeError. That's really helpful for finding bugs at their source instead of allowing them to fester.

Thanks for the comment.

Shannon -jj Behrens said...

Maybe I can explain it this way. The problem with nil is that it often pops up at times when you don't expect it. This leads to bugs. In C, they're called NULL pointer dereferences and they lead to core dumps.

This problem even happens in Java, which has an extremely pedantic compiler. Any good Java programmer can tell you of a time where a null got to a place in the code where he didn't expect it.

If you think of the number of bugs this problem has permitted in all the languages that support NULL, you'll see why Tony Hoare called it a billion dollar mistake. Tony Hoare isn't some random blogger spouting off useless opinions like me ;) He's a famous computer scientist.

These days, languages like Haskell and Scala have new approaches that can get rid of this whole class of bugs. For instance, Haskell's "maybe" monad prevents nulls from getting anywhere where they aren't expected, and the compiler can catch you if you don't write code that can handle the null. Most of the time, null isn't allowed, and you don't have to write any special code to handle it.

Python and Ruby don't have a compiler to apply such checks. However, Python will rely on exceptions for certain "exceptional situations" whereas Ruby falls back to using nil. I really like Ruby, but I think this is one time where I prefer Python's behavior.

Of course, that's just my opinion, and you're free to disagree.

Shannon -jj Behrens said...

Rush is a new programming language coming out of Mozilla that is similar to C++:

http://en.wikipedia.org/wiki/Rust_(programming_language)

I found this line interesting, "The system is designed to be memory safe, and does not permit null pointers or dangling pointers. Data values can only be initialized through a fixed set of forms, all of which require their inputs to be already initialized."

Mark Wilden said...

There has long been a huge debate in the database world about NULL. Most people like having it; some don't. But this is a different discussion. This is about whether or not Ruby hashes should throw an exception instead of returning nil.

I like the current behavior, and it seems that the majority do as well. You don't, but you can obtain the behavior you want with #fetch.

So what's the problem? Are you saying that we should not like the current behavior? Is your post about Ruby or about the Ruby community?

Shannon -jj Behrens said...

> There has long been a huge debate in the database world about NULL. Most people like having it; some don't. But this is a different discussion. This is about whether or not Ruby hashes should throw an exception instead of returning nil.

In the cases of databases, I wish the default was NOT NULL. However, it doesn't usually burn me as badly because I usually remember to add NOT NULL if that's the behavior I want. Note also that NOT NULL is determined once at table creation time, not at query time.

> I like the current behavior, and it seems that the majority do as well. You don't, but you can obtain the behavior you want with #fetch.

I think Martin made a good point on this topic. The problem is that [] is the default operator for getting something out of a hash, and it returns nil. Many gem creators use [] without thinking of the ramifications. That means that even if I always remember to use .fetch(), it doesn't rescue me from gems that don't.

> So what's the problem? Are you saying that we should not like the current behavior?

I'm just trying to explain my opinion on the subject and provide some interesting commentary in comparing the situation to other languages.

I do think it would be cool to create a gem that monkey patches Hash and Object to make things strict by default. Unfortunately, I suspect that this would break other things like Rails that may be expecting less strict behavior.

Remember when Rails 3 made it so that templates escaped HTML by default? I feel the same way about nil--it'd be nice if things were a little bit stricter by default.

> Is your post about Ruby

It's about Ruby's use of nil.

> or about the Ruby community?

I actually really like Ruby's community. Programmers in the Ruby web world seems to work together much better than the Python web world. I love the fact that Merb merged back into Ruby. I love Ruby Toolbox. I love the fact that I can go to a Ruby on Rails company and understand the code within hours rather than days.

Mark Wilden said...

> In the cases of databases, I wish the default was NOT NULL.

No, I was talking about whether NULL should even exist. There are some heavyweights in the database world who do not believe in the entire concept of NULL.

> That means that even if I always remember to use .fetch(), it doesn't rescue me from gems that don't.

It didn't really seem that it was a problem with gems that you were addressing. It seemed more that you wanted to be protected from your own (mis)use of the behavior.

In any event, the gems I use are so 1) well-written, 2) well-tested, and 3) widely-used that I don't think I've ever had a problem with their use of this behavior. Certainly not enough to require a change in a fundamental Ruby class.

I didn't make myself clear about "the Ruby community." I'm not saying you don't like the community or that you shouldn't belong to it. I'm simply saying that you don't like the fact that we use Hash#[] instead of Hash#fetch, where appropriate. Because if we did, this argument would be moot.

In other words, if a piece of code should raise an exception when a nonexistent hash value is accessed, then the programmer should simply use #fetch. If he doesn't, he's created a bug. Does that mean the language should be changed? Especially when most of the time the existing behavior is correct?

Shannon -jj Behrens said...

> No, I was talking about whether NULL should even exist. There are some heavyweights in the database world who do not believe in the entire concept of NULL.

I wasn't aware of that. I don't have a problem with NULL in databases. I just wish "NOT NULL" was the default.

> It didn't really seem that it was a problem with gems that you were addressing. It seemed more that you wanted to be protected from your own (mis)use of the behavior.

I'm talking about everyone's code, not my own.

> I'm simply saying that you don't like the fact that we use Hash#[] instead of Hash#fetch, where appropriate. Because if we did, this argument would be moot.

I strongly suspect that it is fairly common for developers to use Hash#[] when they should use Hash#fetch instead. I'm only an intermediate level Ruby programmer. However, I've read "Ruby for Rails" and "Agile Web Development with Rails" cover-to-cover, and neither book made use of Hash#fetch as far as I could tell.

> In other words, if a piece of code should raise an exception when a nonexistent hash value is accessed, then the programmer should simply use #fetch. If he doesn't, he's created a bug. Does that mean the language should be changed? Especially when most of the time the existing behavior is correct?

Remember that I'm only arguing about default behaviors. I think Hash#[] should be strict by default. I think that making use of an uninitialized instance variable should raise an exception by default, etc.

Shannon -jj Behrens said...

Eiffel 6.4 has removed this problem. They say that Eiffel is "void-safe". In "Masterminds of Programming" Bertrand Meyer said, "I think it is a major achievement because it removes the major potential runtime problem that still exists with object-oriented development. This is the kind of thing that we want to do to increase the reliability of software developers." [p. 438]

Chad Woolley said...

If you don't like it, override Hash#[](key) to raise an error if the key doesn't exist, and even have the error message helpfully tell you what keys you can use. Yay Ruby ;)