Wednesday, March 18, 2009

Personal: Offline

I'm going to try to go offline for about a week. Call me if you need me.

Tuesday, March 17, 2009

Oz: Concurrency Doesn't Have to be Hard

I'm reading a book called Concepts, Techniques, and Models of Computer Programming which is being hailed as following in the tradition of Structure and Interpretation of Computer Programs.

One of the things that I found interesting about the book is that it says that concurrency doesn't have to be hard:
Concurrency in Java is complex to use and expensive in computational resources. Because of these diffculties, Java-taught programmers conclude that concurrency is a fundamentally complex and expensive concept. Program specifications are designed around the diffculties, often in a contorted way. But these diffculties are not fundamental at all. There are forms of concurrency that are quite useful and yet as easy to program with as sequential programs (for example, stream programming as exemplified by Unix pipes). Furthermore, it is possible to implement threads, the basic unit of concurrency, almost as cheaply as procedure calls. If the programmer were taught about concurrency in the correct way, then he or she would be able to specify for and program in systems without concurrency restrictions (including improved versions of Java). [p. 30]
The main programming language taught in the book is the multi-paradigm programming language Oz, which is related to the programming language Alice. I've been a fan of Erlang-style concurrency for years, but I was amazed to see that concurrency based on "promises" is pretty nifty as well. Here's an example I wrote:
declare X Y in
thread
{Delay 10000}
X=10
end
thread
{Delay 5000}
Y=2
end
{Browse starting}
{Browse X*Y}
In this code, X and Y are two values that are calculated on separate threads. The main thread wishes to show the result of X*Y. However, it automatically waits until X and Y have been calculated. It outputs "starting" and then waits for 10 seconds until both X and Y have been set. Notice that the synchronization is completely implicit.

Thursday, March 12, 2009

Books: JavaScript: The Good Parts (Part 2)

I just finished reading JavaScript: The Good Parts. The book weighed in at a mere 145 pages and was written by one of the premier experts in the field of JavaScript. To put it simply, if you code JavaScript and if you read books, you should read this one. Not since K & R's C Programming Language have I seen a book that was so slim and yet so useful.

Previously, I blogged about a lot of the things I learned in the book. In this post, I'm going to concentrate on some specific questions, some disagreements, and some high-level discussion. Let me make it very clear that while I may occasionally disagree on a minor point, I really enjoyed reading this book!
Questions
My first question has to do with "augmenting basic types" (p. 33). I was a bit surprised to see Crockford adding a "trim" method to the "String" type. I thought that a lot of JavaScript communities considered this bad practice. He shows how to add a method conditionally, but I am suspicious of that too.

If two different "applications" on a page both try to augment the same type using the same method name but with different signatures or semantics, won't this lead to difficult to diagnose bugs? Won't adding the method conditionally (as shown on p. 33) merely delay discovery of the bug? Sooner or later, one of the applications is going to try to call the wrong method, and things are going to break in a subtle way. If you're going to do something fairly "global" like augmenting a built in type, I wonder if it might be better to use the shameful trick of adding a prefix to your method name, for instance "String.yui_trim".

My next question has to do with inheritance. Crockford shows how to make use of prototypal inheritance using his "beget" method, and he also shows how to use "differential inheritance" without the use of prototypes (p. 52). When he explains the functional approach to modules, he says:
There are lots of ways to make an object. It can make an object literal, or it can call a constructor function with the new prefix, or it can use the Object.beget method to make a new instance from an existing object, or it can call any function that returns an object.
I like Crockford's functional approach to modules. It like his "beget" method. I also like the differential inheritance approach where you let the superclass actually create the object, and then you just add to the object directly, without using the prototype system.

However, I didn't feel like he explained why and when you would prefer prototypal inheritance over differential inheritance. The most I can say is that if you have a ton of "small" objects with a lot of methods, differential inheritance is probably more expensive since each object must duplicate the references to each function.

I have another question regarding his module pattern. On p. 52-53, he explains how to use a "my" object as a container for secrets that are shared by the constructors in the inheritance chain. I understand the concept. However, is this really necessary if you're using differential inheritance (i.e. one object instead of a prototype chain)? Is this really effective for "secrets"? After all, when you call the constructor, you can pass your own my object so that you can harvest any secrets that get put into it.

My next question has to do with regular expressions. Crockford explains positive lookahead groups ("(?=") as well as negative lookahead groups ("(?|"), but then he says "This is not a good part." (p. 75) Why is that? Do they suffer from implementation variances or performance problems?

On p. 140, Crockford says:
There is another danger in the interaction between external data and innerHTML. A common Ajax pattern is for the server to send an HTML text fragment that gets assigned to the innerHTML property of an HTML element. This is a very bad practice. If the HTML text contains a script tag or its equivalent, then an evil script will run. This again can be due to server incompetance.
Any website that asks for data from a user and then later shows him that data must compensate for the risk of CSS vulnerabilities. This is true regardless of whether or not you're using JavaScript. Does innerHTML somehow make the problem worse? I thought that innerHTML was more "unofficially" standard and a lot less likely to suffer from incompatibility problems than making heavy use of DOM manipulation to add content to a page. Is that not true?
Disagreements
Crockford says, "I reserve block comments for formal documentation and for commenting out." (p. 96) However, on p. 6, Crockford points out that you can't use block comments to comment out code if it contains a regex such as /a*/. It's probably better to stick to line comments for commenting out code.

Crockford says that you should never fall through a case in a switch statement. I agree that the syntax for switch statements is tragically flawed. However, I agree with the BSD style guide. If you are going to fall through a case in a switch statement, use a "/* FALLTHROUGH */" comment. For JavaScript, I'd recommend "// FALLTHROUGH". JSLint could easily forbid code that falls through unless it contains such a comment.

Crockford calls the ++ and -- operators bad parts on p. 112. I agree that overuse of ++ and -- can lead to terse code. If you have to think about pre-increment vs. post-increment, you may be favoring conciseness over readability. However, I see nothing wrong with using ++ or -- on a line by itself or in the last part of a for statement. Consider the following:
for (i = 0; i < list.length; i++) {
if (something) {
count++;
}
}
I don't think that code is unreadable. In fact, I think it's far more readable than the somewhat evil use of the ternary operator shown on p. 145:
return typeof reviver === 'function' ?
function walk(holder, key) {
var k, v, value = holder[key];
if (value && typeof value === 'object') {
for (k in value) {
if (Object.hasOwnProperty.call(value, k)) {
v = walk(value, k);
if (v !== undefined) {
value[k] = v;
} else {
delete value[k];
}
}
}
}
return reviver.call(holder, key, value);
}({'': result}, '') : result;
Defining a multi-line function in the middle of a ternary operator is just wrong ;)
Final Thoughts
As I said earlier, I really liked this book. My complaints are minor at best.

There's something that Crockford said that stands out in my mind:
We all find the good parts in the products that we use. We value simplicity, and when simplicity isn't offered to us, we make it ourselves...We cope with complexity of feature-driven design by finding and sticking with the good parts. It would be nice if products and programming languages were designed to have only good parts. [p. 100]
That's a nice quote. My question is, who gets to pick the subset? If you're coding in Perl or C++, you almost have to pick a subset, but every company picks a different one. Clearly in JavaScript, I'm happy to let Crockford pick the subset, but what about other languages? Perhaps we need more "The Good Parts" books.

Books: JavaScript: The Good Parts

These are some things that I found interesting, surprising, or profound when I read JavaScript: The Good Parts. Test your JavaScript Fu by seeing how many of these you already know!
// I'm doing my best to match Crockford's style.  The code passes JSLint,
// except in cases where I'm showing what *not* to do.
//
// I'm using a bunch of anonymous functions as blocks just so that the namespace
// doesn't get out of hand. Sorry if that's confusing.
//
// I'm allowing evil (i.e. eval) just so that I can use document.writeln. I'm
// not actually using eval itself anywhere.

/*jslint browser: true, evil: true */

// This is a method to create a method.

Function.prototype.method = function (name, func) {
this.prototype[name] = func;
return this;
};

// Create a new object using the given object as a prototype.

Object.beget = function (proto) {
var F = function () {};
F.prototype = proto;
return new F();
};

// Unlike Python and Ruby, two strings with the same value are considered
// identitical.

var s1 = 'c';
s1 += 'at';
var identical = s1 === 'cat'; // true
document.writeln('Strings with the same value are identical: ' + identical);

// Blocks don't introduce a new scope. Only functions do.

(function () {
if (true) {
var s = true;
}
document.writeln("Blocks don't have their own scope: " + s); // true
}());

// Test the for in loop.

(function () {
var prop,
obj = new Function();
obj.foo = 'bar';

document.writeln('Here it is with all the properties:');
for (prop in obj) {

// This has "method" defined from above.

document.writeln(" " + prop);
}

document.writeln('Filter out properties using hasOwnProperty:');
for (prop in obj) {

// This one doesn't.

if (obj.hasOwnProperty(prop)) {
document.writeln(" " + prop);
}
}

document.writeln('Filter out properties using typeof:');
for (prop in obj) {

// Neither does this one.

if (typeof obj[prop] !== 'function') {
document.writeln(" " + prop);
}
}
}());

// Test try and catch.

(function () {
document.writeln('Test exceptions:');
try {

// It's best to include a name and a message.'

throw {
name: 'TypeError',
message: 'number expected'
};
} catch (e) {
document.writeln(' name: ' + e.name);
document.writeln(' message: ' + e.message);
}
}());

// My buddy Leon Atkinson always says that if you don't have the identity
// property, then nothing makes sense. The reason JavaScript is so wonky is
// because all numbers are doubles, and NaN violates the identity property ;)

document.writeln("NaN === NaN: " + (NaN === NaN)); // false

// However, NaN is a big fat liar! NaN is "not a number", but if you ask it,
// it'll lie ;)

document.writeln("typeof NaN: " + typeof NaN); // number

// These are a bit strange.

document.writeln('typeof []: ' + (typeof [])); // Object
document.writeln('typeof null: ' + (typeof null)); // Object

// Unlike Java, all numbers are doubles even if they look like ints.

document.writeln("2 / 5: " + 2 / 5); // 0.4

// Even a function literal can use a name in case you need recursion.

(function function_literal() {
document.writeln("function_literal: " + function_literal);
}());

// Unlike Ruby, "" is falsy.

if ("") {
document.writeln("'' is truthy"); // nope
} else {
document.writeln("'' is falsy"); // yep
}

// Nested functions don't inherit "this" from the outer function. Instead, it's
// bound to the global object. Use "that" as a workaround.

var globalThis = this;

var objWithNestedFunction = {
test_nested_function_this: function () {
var that = this,
nested = function () {
document.writeln('In nested(), this === globalThis: ' +
(this === globalThis)); // true
document.writeln('In nested(), this === window: ' +
(this === window)); // true
document.writeln('In nested(), this === that: ' +
(this === that)); // false
};

nested();
}
};

objWithNestedFunction.test_nested_function_this();

// The book warns against using the constructor invocation pattern. If you
// forget to use "new", then "this" will silently be bound to the global object.
// If you do use them, it's imperative to capitalize them. It's a reminder
// to use "new".

(function () {
document.writeln('Called a constructor without new; this === globalThis: ' +
(this === globalThis)); // true
}());

// Use apply to pass a list of args and control what object should be used
// for "this".

var uses_this = function () {
var i;
document.writeln("uses_this.apply(...):");
document.writeln(" this.a: " + this.a); // 'a'
document.writeln(" arguments:");
for (i = 0; i &lt; arguments.length; i += 1) {
document.writeln(" " + arguments[i]); // 1 and 2
}
};

var applyToThis = {
a: 'a'
};

uses_this.apply(applyToThis, ['1', '2']);

// The arguments function returns something that is array-like, but not a true
// array.

(function () {
document.writeln("typeof [].join: " + typeof [].join); // function
document.writeln("typeof arguments.join: " +
typeof arguments.join); // undefined
}(1, 2, 3));

// If you use a constructor with "new", you can return some object. If you
// don't, "this" is returned implicitly.

var ReturnSomethingElse = function () {
this.b = 'b';
return {
a: 'a',
'this': this
};
};

var somethingElse = new ReturnSomethingElse();
document.writeln("Returned something other than this:");
document.writeln(" somethingElse.a: " + somethingElse.a); // 'a'
document.writeln(" somethingElse['this'].b: " +
somethingElse['this'].b); // 'b'

// Instead of using beget, he recommends using "differential inheritance". See
// p. 53 for the full pattern.

var mammal = function (spec) {
var that = {};

that.speak = function () {
document.writeln("I'm mute.");
};

return that;
};

var cat = function (spec) {
var that = mammal(spec);

that.meow = function () {
document.writeln("Meow.");
};

return that;
};

var myCat = cat({some: 'keyword args'});
myCat.speak(); // I'm mute.
myCat.meow(); // Meow.

// He calls this a "part". In other languages, I think you'd call it a mixin.

var friendliness = function (that) {

that.speak = function () {
document.writeln("Hi! I'm " + that.name);
};

return that;
};

var person = {name: 'JJ'};
friendliness(person);
person.speak();

// "An array is a linear allocation of memory in which elements are accessed
// by integers that are used to compute offsets. Arrays can be very fast data
// structures. Unfortunately, JavaScript does not have anything like this
// kind of array." [p. 58]

// JavaScript implements arrays using hashes.

(function () {
var i,
array = [0, -1];

// Notice, this would be considered out of bounds using a real array.

document.writeln("array.length: " + array.length); // 2
array[100] = -100;
array[4] = -4;

// These will be all out of order, just like a normal hash.

document.writeln("array:");
for (i in array) {
if (array.hasOwnProperty(i)) {
document.writeln(" i: " + i + " array[i]: " + array[i] +
" typeof i: " + typeof i); // string
}
}
}());

// \b and \w can't handle I18N'ized text.

(function () {
var re = /^\b\w+\b$/;
document.writeln('aAa: ' + re.test('aAa')); // true
document.writeln('a\u00C1a: ' + re.test('a\u00C1a')); // false
}());

// By default, sort sorts numbers incorrectly.

(function () {
var numbers = [4, 8, 15, 16, 23, 42];
numbers.sort();
document.writeln("Incorrectly sorted numbers: ",
numbers.join(", ")); // 15, 16, 23, 4, 42, 8

// Here's how to sort them correctly.

numbers.sort(function (a, b) {
return a - b;
});
document.writeln("Correctly sorted numbers: ",
numbers.join(", ")); // 4, 8, 15, 16, 23, 42
}());

// Don't confuse slice(start, end) with splice(start, deleteCount, item...).
// slice uses end and is non-destructive. splice uses deleteCount and is
// destructive.

(function () {
var numbers = [0, 1, 2, 3, 4];
document.writeln("slice(1, 3): " + numbers.slice(1, 3)); // 1,2
document.writeln(" afterwards:" + numbers); // 0,1,2,3,4
document.writeln("splice(1, 3):" + numbers.splice(1, 3)); // 1,2,3
document.writeln(" afterwards:" + numbers); // 0,4
}());

// Some implementations suppress empty strings in the output array when the
// separator is a regular expression.

(function () {
var orig = "1,,3,4",
rejoined = orig.split(/,/).join(',');
document.writeln("Rejoined:" + rejoined); // 1,,3,4 on Firefox, at least.
}());

// I do believe the Java style guide suggests using a blank line after a
// multi-line if statement. Crockford suggests indenting the lines after the
// first line 8 spaces instead of 4. I think that destroys the parallelism.

if (true === false &&
false === true) {
document.writeln("We're done for.");
}

// You can use a normal function before it's even defined because of function
// hoisting, although the book never uses normally named functions.

(function () {
f(); // works
try {
g(); // fails
} catch (e) {
document.writeln("Calling g ahead of time didn't work.");
}

function f() {
document.writeln("Calling f ahead of time works.");
}

var g = function () {
document.writeln("Calling g ahead of time won't work.");
};
}());

// Here's a really fun example of function hoisting. f will get hoisted to the
// top in Safari but not in Mozilla.

(function () {
try {
f();
} catch (e) {
document.writeln("f was not hoisted."); // Mozilla
}
if (false) {
function f() {
document.writeln("f was hoisted."); // Safari
}
}
}());

// Crockford recommends using hanging braces to avoid the following problem
// when returning objects. JavaScript messes up the following because of
// semicolon insertion.

(function () {
var return_object = function () {
return // A semi-colon will be insered by JavaScript here.
{ // This is treated as a new statement.
a: 'a'
};
};

document.writeln("return_object returned undef: " +
(typeof return_object() === 'undefined')); // true
}());

// Always use a radix when using parseInt. Otherwise, numbers that start with
// '0' will parsed as octal.

(function () {
document.writeln("parseInt('09'): " + parseInt('09')); // 0
document.writeln("parseInt('09', 10): " + parseInt('09', 10)); // 9
}());

// JavaScript uses doubles for all numbers. Integer arithmetic is exact with
// doubles. If you're dealing with money, your best bet is to scale things
// (e.g. use an integer for pennies) so that the arithmetic will be exact.

(function () {
document.writeln("0.1 + 0.2 === 0.3: " + (0.1 + 0.2 === 0.3)); // false
document.writeln("1 + 2 === 3: " + (1 + 2 === 3)); // true
}());

// If you use an object to keep track of the number of times something has
// happened before, you should probably remember to use hasOwnProperty.
// Otherwise, when you go to increment the count for "constructor", you'll get
// this crazy bug.

(function () {
var counts = {},
word = "constructor";
if (counts[word]) {
counts[word] += 1;
} else {
counts[word] = 1;
}
document.writeln("counts[word]: " + counts[word]); // Some weird string.
}());

// Always use === and !== instead of == and !=. Otherwise, you may be
// surprised by the casting that == and != do.

document.writeln("' \\t\\r\\n ' == 0: " + (' \t\r\n ' == 0)); // true
Here's the HTML for the above JavaScript:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<html>
<body>
<pre><script src="program.js"></script></pre>
</body>
</html>

Monday, March 09, 2009

Apple: Bluetooth TCP/IP Network

My boss and I are at a coffee shop. We had to buy $5 worth of stuff for one of us to get a wireless connection. We both have Macs. Within 15 minutes, we were able to create a bluetooth network, and I'm sharing my wireless connection with him over bluetooth.

I'm positive this sort of thing would work in Linux too, but the fact that we got it working in 15 minutes without any documentation is truly a testament to Apple's user friendliness.

Wednesday, March 04, 2009

MySQL: Case-sensitivity Hell

After narrowly escaping Encoding Hell, I fell into Case-sensitivity Hell.

I work with data in big batches. This time around, I was starting with an empty database and importing some data. I noticed that it said that some rows were deleted, which is a funny thing to see when importing data into an empty database. I came up with a plausible explanation which I won't get into and just moved on.

During my batch processing, I pull down all the URLs from a table into a Python dict. My program was getting a KeyError because it couldn't find the URL "http://www.example.com/layout/keywords/Broken%20heart". I did a select statement in the MySQL shell, and I could see it.

Hmm, that's weird. I looked at the code for pulling down all the rows, and it looked fine. I did a count(*) on the table, and it matched the size of my dict. Very weird. I couldn't figure out why my dict didn't have the URL even though it should clearly be there.

After an agonizing, multi-hour debugging session, I finally explained the situation to my wife using an analogy, and she gave me a solution for my analogy.

I finally figured out that since the column was a TEXT column with the database-wide COLLATE set to utf8_unicode_ci, it was returning "http://www.example.com/layout/keywords/broken%20heart" even though I had asked for ""http://www.example.com/layout/keywords/Broken%20heart". See the difference in b vs. B? I didn't ;)

Then it dawned on me, MySQL was "deleting" rows during the MySQL import because it was ignoring URLs that only differed in case. Later, it was giving me one URL in my select statement even though I had searched for a different URL that differed by case. However, my Python dict was definitely not case insensitive, which is why I was getting KeyErrors.

Well, how do you tell MySQL that you want a column to be unique in a case sensitive manner? It turns out there is no COLLATE utf8_unicode_cs (cs stands for case-sensitive). Instead you have to use COLLATE utf8_bin which causes the sort order to behave weirdly as explained here. Apparently, Case-sensitivity Hell and Encoding Hell are next door neighbors.

Ugh, painful. The sad thing is that if you went back in time 30 years ago, you could probably find some other programmer griping about the fact that he just spent a day in Case-sensitivity Hell ;)

MySQL: Encoding Hell

Every once in a while, I end up in this weird place called Encoding Hell, and it takes me about a day to get out of it. Usually it's related to MySQLdb, the MySQL driver for Python; however, this time it was related to URLs.

I was trying to do a MySQL import. I kept getting a lot of warnings, which I usually try to fix. I couldn't even figure out what the warnings were because I was using the mysqlimport tool. After a while, I figured out that if you do the import from within the MySQL shell, you can run "SHOW WARNINGS;".

Anyway, I got a warning like "Incorrect string value: '\xF1os' for column 'category' at row 76997". I traced it back to a URL like "http://www.example.com/themes/keywords/southside%20sure%F1as". That's an ASCII URL, so I couldn't figure out what the problem was.

I had some code that was splitting the URL into two other parts. It used a regex to pull out the parts I wanted, and then it unurlencoded the parts. It turns out that once you unurlencode 'southside%20sure%F1as', you are left with 'southside sure\xf1as'. I tried to .decode('UTF-8') it, but it didn't work. I finally figured out, thanks to Vim's automatic encoding detection, that I needed to .decode('Latin-1') it, and I ended up with 'southside sureñas' (whatever that is).

What's interesting is that I started off with a perfectly fine ASCII URL and ended up with some Latin-1 that I wasn't expecting. That's a good reminder that user-submitted data is pervasively dangerous--those could have been control characters or something.

Tuesday, March 03, 2009

Computer Science: Faster, Smarter, More Robust

My wife and I like to play video games together cooperatively. Generally, she plays the exploratory parts, and I play the parts the require quick reflexes.

Recently, we've been enjoying a game called iNinja for the GameCube. (Thanks Ben Bangert!) As we neared the end of the game, we reached a level called "Egg Shell Skull" that seemed impossibly hard. We each spent a couple hours trying to beat it, but to no avail. It required 100% accuracy, very fast reflexes, and a little bit of multitasking. I could "program myself" to have fast reflexes with high accuracy, but every time I mixed in the multitasking, my reflexes went out the window.

Finally, I had the idea of playing the level together. I used the left side of the controller which involved ducking with 100% accuracy and very fast reflexes, while she focused on the right side of the controller which involved planning, combos, and monitoring a timer. After several tries, we finally beat the level :)

Since this is a purely technical blog, you might have guessed that I'm not trying to focus on how great a gamer I am ;) I'm trying to point out something. In a real-time system, you don't always have enough clock cycles for the best algorithm possible. That's doubly true for "coding in wetware". I was able to program my brain to have high accuracy and low latency, but only if the task was very, very simple. There's just not much decision making you can do in low latency situations. This is backed up by what little I know about the amygdala and the neo-cortex.

There's another case of this pattern that I find enlightening. I saw a talk from an air traffic controller, and he was explaining that there are three systems that are used simultaneously to help planes avoid hitting each other.

There is a very stupid and robust system that kicks in when two planes are about to collide. This system is in charge of implementing the correct "duck!" algorithm. The next system is in charge of avoiding a "loss of separation" within the next 5-7 minutes or so. It's more complicated, but doesn't have to worry about wind, etc. The highest level system is in charge of avoiding a loss of separation in the next 12 minutes. It is extremely complicated, and has to account for things like wind, etc. These separate systems have separate code bases, naturally.

In the future, I wonder how often we'll see cases like this. Sometimes, neither the simple, fast, and stupid solution nor the elegant but complex solution will do the trick. Sometimes you need both--just like the human brain needs both.