Tuesday, November 29, 2011

Computer Science: The Travelling Salesman Problem

I was thinking about the Travelling Salesman problem this morning. I came up with an algorithm that permits a few nice optimizations. My guess is that Knuth probably already came up with this algorithm, formally analyzed it, and then came up with 15 others that were much better. Nonetheless, I figured I'd jot it down.

Warning: This is a rough draft of an algorithm. You shouldn't bother reading this if you want real rigor. I'm just writing it down to get it off my chest.

Rather than picking the first segment of the journey and working my way to the end, I want to pick a middle segment of the journey and work my way outwards recursively.

Given a list of distances (or times, or costs, or whatever) between cities, sort the list from shortest distance to longest distance regardless of which cities are involved. Now, loop over this list and use each element of the list as a middle segment of your journey. This gives us a list of "first steps" (or rather, first middle steps). Looping over the list from shortest distance to longest distance is an important optimization because it increases the likelihood that we'll find an optimal path early, allowing us to skip over a lot of potential paths later.

Also sort the list of distances so that for each city, we have a list of other cities in ascending distance order. By sorting all of the city pairs in order of (first city, distance), you can use one sort for all of the pairs.

Now, here comes the recursion. We have a bunch of middle parts of the journey. Use recursion to extend the journey either by adding to the beginning of the journey or by adding to the end of the journey. Keep recursing until you have a complete path or a partial path that is already longer than the best path seen so far. Now, we can either extend the journey at the beginning or at the end. Recursively try extending the journey by either adding to the beginning or the end. However, do it in order so that you try adding the shorter distances first. There's an implicit merge sort going on in this algorithm. This, again, is an optimization to allow you to skip work later.

While we were recursing, we had a function that took two things, a middle chunk of the journey and a set of cities not yet visited. Apply memoization so that anytime we get the same parameters, we return the same answer (by using a cache, of course). This is an important optimization.

Last of all, using the above algorithm, we'll quickly come up with the first complete path that has a decently minimal distance. Keep track of this as the best complete path seen so far. Anytime we are recursing and we come up with a partial path that is longer than the best complete path seen so far, we can stop recursing, give up, and try something else. This is an important optimization to save work "at the leaves of the tree".

I can't even wrap my head around the big O of this algorithm. I suspect it's one of those cases where you use words like "amortized cost".

This algorithm does have a weakness. If every distance between cities is exactly the same, it'll try every possibility. Similarly, if every distance between cities is exactly the same except one pair of cities which has a longer distance, it'll still try every possibility. I'm not sure of the degree to which the memoization fixes this problem. It'd be good to extend the algorithm somehow to recognize this situation and short circuit, i.e. automatically throw out paths of duplicate length.

Math: Factoring Numbers

I was thinking this morning about factoring numbers. I wonder if it might sometimes be helpful to use real numbers to gain an interesting perspective in order to solve certain problems involving integer numbers (i.e. number theory problems). For instance, I was thinking about factoring large numbers.

For every natural number, C, (that isn't equal to 0 or 1), there are an infinite number of pairs of positive, real numbers A and B for which A * B = C. For instance, 6 = 1.0 * 6.0 = 2.4 * 2.5 = 2.0 * 3.0 = ... I wonder if playing around with pairs of real numbers like (2.4, 2.5) can lead you to pairs of integer numbers like (2, 3).

Imagine all the pairs (A, B) for which A * B = C. Let's create a way to graph all such pairs in a funny sort of way. Let's pick a bunch of A's going from 0 to C. For each different A, we can calculate B via C / A. Let's consider the parts of A and B that are to the right of the decimal point to see if one pair, (A, B) can lead us to another pair (A', B') which are integers (i.e. have only zeros to the right of the decimal point). In fact, let's see if we can come up with a numerical analysis approach where we use estimations to hunt down (A', B').

To do this, let's create a funny sort of three dimensional graph. Here's the pseudo code (assuming we're trying to factor some number, c):
for step in range (1, LARGE_NUM_OF_STEPS + 1):
a = (step / LARGE_NUM_OF_STEPS) * c
b = c / a
x = a - int(a) # x and y will always be in the range [0, 1).
y = b - int(b)
z = floor(a)
draw_point(x, y, z)
If you use a very large number for LARGE_NUM_OF_STEPS, you'll create a funny looking 3D graph. Any place where x = 0 and y = 0, you'll have a pair of integers (a, b) that multiply to equal c.

Naturally, this is an extremely expensive way to factor numbers. However, I'll bet you'd learn a lot about factoring numbers by looking at this graph. In fact, I'm guessing that looking at this graph will lead you to a numerical-analysis-style approximation algorithm for honing in on valid integer pairs (a, b) where a * b = c.

Humans are fairly good at visualizing things in 3D, but it'd be really cool to extend this graph to four dimensions (perhaps using time as the fourth dimension). The fourth dimension would be used for various values of C, from 0 to infinity. I really think that looking at such a graph would help with fast number factoring.

Updated: Fixed a couple of errors pointed out by BMeph.

Friday, November 25, 2011

Computer Science: NP-complete Problems are Really NP-complete

First of all, let me apologize for my sloppy terminology. I've always been a better software engineer than a computer scientist.

This is how Wikipedia describes NP-complete problems:
In computational complexity theory, the complexity class NP-complete (abbreviated NP-C or NPC) is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial-time, and also in the set of NP-hard problems so that any NP problem can be converted into L by a transformation of the inputs in polynomial-time.
Although any given solution to such a problem can be verified quickly, there is no known efficient way to locate a solution in the first place; indeed, the most notable characteristic of NP-complete problems is that no fast solution to them is known. That is, the time required to solve the problem using any currently known algorithm increases very quickly as the size of the problem grows. As a result, the time required to solve even moderately large versions of many of these problems easily reaches into the billions or trillions of years, using any amount of computing power available today. As a consequence, determining whether or not it is possible to solve these problems quickly, called the P versus NP problem, is one of the principal unsolved problems in computer science today.

While a method for computing the solutions to NP-complete problems using a reasonable amount of time remains undiscovered, computer scientists and programmers still frequently encounter NP-complete problems. NP-complete problems are often addressed by using approximation algorithms.
Note that it says, "As a consequence, determining whether or not it is possible to solve these problems quickly, called the P versus NP problem, is one of the principal unsolved problems in computer science today." I haven't seen the proof for it, but I've also heard that if you could prove that one NP-complete problem does not have a polynomial-time solution to it, then that would prove that none of them have a polynomial-time solution to them. The theory is a little over my head, but I'm going to take a shot.

I'd like to propose a problem that is NP-complete and show that it does not have a polynomial-time solution to it. Imagine I pick an n-tuple of random 32-bit, unsigned integers. Your job is guess what the n-tuple is. Hence, where n is 2, I might pick (376, 792), and your job is to guess that tuple. If you guess a certain tuple, I can tell you whether you're right or not. Hence, I can verify a solution in polynomial-time (it takes me O(n) number of int comparisons to verify a solution). However, to try to solve such a problem, you either have to get really lucky or you have to brute force it. If you find a method other than brute force trying every possible solution, then obviously there's something wrong with my random number generator. To use brute force to guess the solution requires O(A^n) for some constant A. Since it's "to the n", it's not polynomial-time (3nA^3 is polynomial-time since the exponent is fixed; A^n is not polynomial-time since the exponent varies with the size of n).

Now I know that some of you reading this understand the computer science a lot better than I do. Have I actually shown (at least informally) that NP-complete problems are really NP-complete? If not, can you, in lay man's terms explain what I'm misunderstanding? Thanks!

Friday, November 18, 2011

Tetris in Dart


For my first Dart program, I decided to implement Tetris. I was inspired by Alexei Kourbatov at javascripter.net which contains an implementation of Tetris he wrote way back in 1999!

You can play my version here. The source code is here.

Overall, I like Dart. The Dart version of my code is about the same length as the JavaScript version of my code (the Dart version is slightly longer because I switched from using innerHTML to using the DOM). The combination of the optional static typing and Dart Editor helped me avoid many, many bugs. I'm still a Python guy at heart, but my experience with Dart was a very pleasant one.


I jotted down some lessons I learned. Warning: This is my first Dart program, and I haven't even read the tutorial yet, so I could be missing the boat on some of these points!

Don't try to play with DOM elements before the DOM is loaded. The easiest way to achieve this is to import your Dart-compiled JavaScript at the bottom of the page.

Dart has generics. This is a little unfamiliar for a scripting guy like me.

I had a hard time finding methods like JavaScript's Math.abs and Math.floor. They're now members of num.

This is how you add to the DOM:
HTMLDocument doc = window.document;
HTMLParagraphElement p = doc.createElement('p');
p.textContent = message;
doc.body.appendChild(p);
Using innerHTML works too, but I suspect it's frowned upon.

DOM manipulation isn't as easy as using innerHTML. It's not quite as horrible as I feared, but it's still a pain in the butt.

Sometimes you need to use a temporary variable to help out the type system. If you write:
doc.getElementById("s-" + i + "-" + j).src = 'images/s0.png';
it'll say "'src' is not a member of Element." However, you can easily fix this by writing:
HTMLImageElement img = doc.getElementById("s-" + i + "-" + j);
img.src = 'images/s0.png';
Translating from JavaScript to well-typed Dart is a linearly time-consuming, but relatively straightforward progress.

The intelligent code completion in Dart Editor was really helpful since I don't really know the language or its APIs. I can edit text a lot faster in Vim, but code completion helps me know what to write.

Trying to figure out the proper way to handle keyboard events in Dart is hard because of the DOM. I'm looking at developer.mozilla.org/en/DOM/KeyboardEvent, and it looks like there are two ways to do things, the deprecated way, and the way that isn't implemented yet.

Command-g is the usual Apple shortcut to find the next occurrence of something, but it doesn't work in Dart Editor.

They're working on a new DOM API. I've decided not to use it until they say it's stable.

The way main() works in the Sunflower Example doesn't match the way main() works in sample applications generated by Dart Editor. I'm not sure why.

Wednesday, November 02, 2011

Eclipse vs. Intellij for Android, PlayN, Google App Engine, and Python Development

I can't figure out whether I should use Eclipse or IntelliJ.

Eclipse is good in that it's open source and free. Furthermore, Google has released several plugins for it such as the ADT (Android Developer Tools) plugin and the Google Plugin for Eclipse. However, Eclipse generally leaves me feeling confused, overwhelmed, and out of control. I spent two days reading a bunch of tutorials, but I still feel like I can't do what I want. I installed Aptana Studio 3 (which includes PyDev) in order to play around with my Python Google App Engine project. However, I couldn't figure out how to do two things: 1) update the PYTHONPATH for my project to include a few project-specific directories 2) use two space indents in Python code just for this project (which is the standard at Google).

On the other hand, there's IntelliJ. I've never used IntelliJ for Java development, but I've used PyCharm and RubyMine for Python and Ruby development. The downside is that it's fairly expensive. The upside is that it's really good. It doesn't leave me feeling confused, overwhelmed, and out of control. In general, I'm able to get it to do what I want. Furthermore, the IDEAVim keybindings are pretty good (not as good as the JVi plugin for NetBeans, but still pretty good).

I'm hoping to start with a toy project that uses the PlayN framework (e.g. Java), Android, etc. It'd be nice if I could use the same IDE for Java, Python, web stuff, etc. I don't really know all the various Google APIs such as Google App Engine for Java and Android, so an IDE that can guide me along would be helpful. It seems like the only solution is to use Eclipse for the Google stuff and IntelliJ for Python, Ruby, and web stuff. I haven't purchased a license yet, but I did win a license for RubyMine which might be applicable.

I just read Android Development: Eclipse vs. IntelliJ IDEA which said a) IntelliJ is way better b) all of the Android tools are still accessible outside of the IDE anyway. I also noticed that IntelliJ has come out with its own Android plugin, and it's even open source. I'm leaning toward IntelliJ, but I hate being the only one around using a certain tool. Any advice you guys have would be welcome.

(By the way, I'm sure someone is going to come along and plug Vim, Emacs, or TextMate. I'm personally a Vim diehard. However, I've come to appreciate the benefit of using Vim keybindings within a larger IDE. YMMV.)