Tuesday, September 01, 2009

My Take on Writing Code that Doesn't Suck

I just finished watching Writing Code That Doesn't Suck by Yehuda Katz. I think he also could have called it "Writing Tests That Don't Suck". I'm going to write a summary of some of his points (along with some of my own notes). He covers a lot of the things I've been thinking about over the last couple months as a Python coder coding in Rails using behavioral driven development (BDD).

Most Rails programmers agree that tests are good. However, we don't spend enough time considering which tests are the most valuable. In fact, Yehuda points out that a lot of tests are quite useless.

Unit tested code doesn't mean bug free code. In my own experience, I'm often so tired after writing automated tests that I don't have enough mental energy to do solid exploratory testing. Having someone other than the programmer do some serious exploratory testing is necessary.

alias_method_chain is bad. Having a chain of modules monkeypatching the same method leads to a brittle modularity nightmare that is unmaintainable and unreausable. It's also a sign of a poorly designed API. Just because Ruby lets you do things like that doesn't mean we should be content with poorly designed APIs.

Matz specifically doesn't want to add interfaces to Ruby. However, Yehuda argues that interface-oriented design is a very good thing. Yehuda reminds us that everywhere other than Ruby, coders know that good design means coding to an interface, not to an implementation.

You might have code and unit tests that test that code. You might refactor them both together, and all your tests might pass. However, if you break the API to your code, you've broken everyone else's code that is reliant on that interface. That's why regression testing is important. It doesn't matter how the implementation works--the only thing that matters is that your interface continues to behave as promised.

That's why testing the external interface is so much more important than unit testing the internal implementation. Heavily mocked and stubbed unit tests just aren't as useful as full-stack testing (with, say, Cucumber and Webrat). I personally think there's nothing worse than a controller, model, and a view that are tested individually using mocks and stubs, but which blow up when combined. That's not to say that unit testing isn't useful--it's just that what I would call integration testing or what Yehuda would call regression testing is even more useful. Test what you actually care about--i.e. the external behavior.

"some_object.should respond_to(:some_method)" is not a useful test. If you're not testing what some_method actually does, then why bother, and if you are testing what some_method actually does, then you don't need to test for its existence. I think that "some_object.should respond_to(:some_method)" is sort of like sunbathing--it makes you feel better about yourself and (arguably) makes you look good, but it's not helpful for your overall health and well being.

At the risk of beating a dead horse, when I see tests like "some_object.should respond_to(:some_method)", it really makes me wish I were coding in Haskell. Haskell's type system can enforce interfaces like this without writing boring, do nothing tests. Perhaps that's why so many Ruby coders have switched to Scala.

Yehuta does think that TDD with unit tests is helpful for designing your code, but those tests aren't good as regression tests. After your code is working, then you should write regression tests that cover the things that you actually care about, not the irrelevant implementation details.

Yehuda doesn't have a computer science background. Hence, he is relearning a lot of the lessons that MIT's Structure and Interpretation of Computer Programs has been teaching for decades, and I think that's good. It seems like much of the Rails world has forgotten a lot of the things we've learned about modularity, encapsulation, and good design over the last 30 years. Hopefully, thanks to Yehuda and Merb, that will improve.

It will be interesting to watch the synthesis of good design as taught by SICP with a strong commitment to testing that is popular in the Rails world.

6 comments:

Rafael Noronha said...

I agree what really matter is the external behavior, and integration tests should be more important than unit tests.

Probably a complete coverage of integration and acceptance tests is much more secure and smart than a complete coverage of unit tests.

Yes, unit tests should play their role while guiding the design with stuff like tdd.

j_king said...

How about applications that don't require such radical separation? Sometimes it makes sense. But most web frameworks these days are taking it to an extreme. The amount of serialization/deserialization between the persistence layer and the end user is actually quite amazing. It's almost like a teetering tower of babel.

Regression testing isn't a silver bullet either.

It has to be used in concert with other testing methods. You have to test with a big ol' shotgun. The most heinous bugs are the ones you didn't think to test for (yay race-conditions, fuzzy inputs, etc).

Of course, for all the benefits of dynamic typing the downfall is pretty obvious: you have to test for some pretty primitive errors.

Kelly Felkins said...

Regarding interfaces, and testing interfaces: I think this goes without saying. But for most of us it is not that much of an issue. Most of us are not building frameworks used by others. Most of us are clients of the interface, not producers.

You single out the test "some_object.should respond_to(:some_method)". I do that all the time. Everyone has a different threshold to what they test. To me it is part of a process.

It's like playing black jack. With black jack you should never think while playing the game -- every play is predetermined and done in response to specific circumstances.

I like to test drive *everything*. I don't want judgement getting between me and my code.

Why? Well, we developers have a tendency to over think and over engineer. You don't over think or over engineer if you are truly test driven. Does that object need "some_method". Then write the test, see it fail, then add the method and see it pass. When you follow this simple process you will write only what is needed. If you don't follow this process you will be tempted to write a little more. (I know I need some_method, and I also have this other thing that needs to do something similar, so I'll just go ahead and make some_method some_bigger_method right now...)

My peers at Pivotal have taught me a lot about testing. One of the lessons is don't test the same thing twice (although I've never heard it said like that). Basically, if you verify in your unit test that your method does what it is supposed to do, then you don't need to verify that in any other test -- in fact, you don't even need to call the actual method. Some people take this as a rule to mock out everything. I don't. I mock things out because I have some need for isolation. If I am testing a view, and that view invokes a helper, and that helper has been tested, I don't need to mock that helper out when I test the view. But sometimes there is a compelling reason to mock it out. Sometimes you haven't implemented the helper yet. Sometimes that method is expensive or slow. Sometimes the specifics of that method will likely change and you want to isolate that change from the rest of your test suite.

Also your tests and your test suites should run quickly. If they are slow you are less inclined to run them. As our projects grow and our test suites go longer things start breaking down. Eventually we do something to speed up the tests.

Mocking and stubbing allows you to write specific tests. These specific tests often run very quickly. When they fail you have a laser beam pointing at the problem. If your integration test fails you have a flood light over the entire code base. I once called selenium tests the "daisy-cutter" of testing -- high level, if something's not right it's going to be exposed by broad test coverage, but you have to sift through the debris to try to find the thing that caused the failure.

These are scale issues. On a small project with a small team it is probably fine and effective to test mostly at the integration level. As the projects grow it becomes more important for tests to be fast and specific.

Shannon -jj Behrens said...

> You single out the test "some_object.should respond_to(:some_method)". I do that all the time.

Yes, I singled that one out on purpose, because I knew it was common practice at Pivotal, yet Yehuda Katz is highly critical of it in the video. It's evidence of a deep philosophical divide between what he would call Merb 0.5 testing (which is what I've seen Pivotal practice) vs. Merb 0.9 testing. I think it's worth your time to watch the video. Heck, I thought it was useful enough that I watched it twice, and wrote this blog post ;)

JESii said...

Thanks for an excellent follow-up to Yehuda's presentation; good stuff.

The point that hit home for me was this: "... is re-learning a lot of the lessons" is a recurring theme in computing. I've been programming for over 40 years starting with IBM assembler and progressing through many different languages. Even though I don't have a CS degree (they didn't offer one when I was getting my BS), I've seen this over and over. I well recall reading the first many issues of Byte where new programmers proudly produced their wonderful new Basic programs. As a professional programmer, I looked at their results with horror: most were masses of jumbled GOTOs and totally impenetrable logic -- even though they were small programs. I can't imagine what some of the larger programs looked like, but I have a good guess. Or think back to the days when Access broke onto the scene and well-meaning but untrained office workers started writing individual applications that morphed into multi-user "systems" that were unreliable and unmaintainable, yet critical to operations (I evaluated one such system which had become THE primary system for processing new orders at a major telecommunications company, yet had no sound basis for multi-user, multi-site operation).

It seems to me that part of the "problem" is a result of computing's success: computing power continues to be more available and more ubiquitous then we ever thought possible. As a result, more and more people -- most of them without any CS training -- are given the opportunity to write software. Because of the greatly increased capability of the tools, they are able to quickly and relatively painlessly get results from their efforts... and so, a new programmer is born. I imagine that many of today's programmers might not have been so jazzed and ready to continue if they had to tirelessly and painfully enter their program into the computer through console front-panel switches and read the lights to make sure that they had entered the code correctly.

There's also the "problem" that newer tools (PC Basic, MS Access, Ruby, Rails....) just can't immediately start out with the full-blown capabilities that other more mature languages might have. I've heard Smalltalk programmers (at a past Ruby Music Meetup) compliment Ruby while bemoaning the lack of development, testing, and debugging capabilities -- smoothly integrated -- that Smalltalk offered them. Their assessment was it would take several years of development before Ruby would meet the maturity level and capabilities that Smalltalk already provided. Makes sense: Smalltalk has been being developed, tested, expanded for many more years than has Ruby. And of course, many of these new tools often reinvent the wheel without realizing that many of the vaunted capabilities existed long before. One recent example is the concept of "FIT" (as in Framework for Integration Testing by Ward Cunningham) which is implemented in Cucumber as Scenario tables (Aslak Hellesoy said in a post: "Let's stop calling tables "FIT tables". Tables are *inspired* from the FIT tool, but they *are not* FIT tables. Better names: Step tables and Scenario tables.") Whatever you call them now, they seem to me to be clearly inspired by Decision Tables which were around way back in the 70's and largely forgotten.

I've also seen many cases where new programmers using new tools and capabilities somehow believe that those new capabilities completely free them from the constraints of the old paradigms. There's often been a sense of disdain on the part of the new crowd for the old crowd... and vice versa. Different folks start from a different point and think their way is the best, which leads to a lessening of cross-fertilization of ideas.

Yehuda's comments about alias_method_chain remind me of similar complaints from the past: "A Case against the GO TO Statement" by Dijkstra and the outright banning of "ALTER GOTO" in the COBOL community.

So... let's keep re-learning the old lessons; sometimes they ain't so bad.

Shannon -jj Behrens said...

Yep. Thanks for the comment.

I've often joked that all you have to do to be a success in the Python world is to grab a cool feature from Common Lisp and port it to Python ;)

However, I do think that the fanatical attachment to automated tests written by the programmers themselves is somewhat new. I surely didn't see that in the SICP videos. I also like the way that Cucumber tests are, to some small degree, an executable version of literate programming.

I'd really like to see the following three things combined:

* The good design sense from SICP.
* The strong testing ethic in Rails.
* An extremely sophisticated compiler, a la Haskell's GHC.