Monday, April 27, 2009

Talk: Python Tools, the UNIX Philosophy, and sort Tricks

Updated link.

I recently gave a talk at BayPiggies called Python Tools, the UNIX Philosophy, and sort Tricks. Thanks go to Glen Jarvis for recording it. The "slides" are below:
This is a random collection of topics related to Python tools.

Talk about the UNIX philosophy:
Small tools.
My problems tend to be too large for RAM, but not too big for one machine.
UNIX and batch processing are a natural fit.
Multiple processes = multiple CPUs.
Multiple programming languages = more flexibility.
Pipes = concurrency without the pain.
Scales linearly and predictably, unlike databases.
UNIX tools that already exist are helpful and fast.

Use the optparse module to provide consistent command line APIs:
Here's an example of the setup from the docs:
: from optparse import OptionParser
: parser = OptionParser()
: parser.add_option("-f", "--file", dest="filename",
: help="write report to FILE", metavar="FILE")
: parser.add_option("-q", "--quiet",
: action="store_false", dest="verbose", default=True,
: help="don't print status messages to stdout")
: (options, args) = parser.parse_args()
Here's an example of my own help text
: Usage: cleancuttsv.py [options]
:
: Options:
: -h, --help show this help message and exit
: --assert-head=FIELD1\tFIELD2\t...
: assert that the first line of the file matches this
: --delete-head delete the first line of input
: -n NUM, --num-fields=NUM
: assert that there are this many fields per line
: --drop-blank-lines delete blank lines instead of raising an error
:

sort:
http://jjinux.blogspot.com/2008/08/python-sort-uniq-c-via-subprocess.html
sort -S 20% -T /mnt/some_other_drive ...
http://jjinux.blogspot.com/2008/08/python-memory-conservation-tip-sort.html

tsv:
You need a consistent format.
Downsides:
Most UNIX tools don't understand true TSV, but only an approximation thereof:
My own code raises an exception in cases where it would actually matter.
Many UNIX tools are ignorant of encoding issues:
Sometimes playing dumb works and sometimes it hurts.
Using the csv module:
: import csv
:
: DEFAULT_KARGS = dict(dialect='excel-tab', lineterminator='\n')
: MYSQL_LOAD_DATA_INFILE_DESC = """\
: FIELDS TERMINATED BY '\t'
: OPTIONALLY ENCLOSED BY '"'
: ESCAPED BY ''
: LINES TERMINATED BY '\n'"""
:
: def create_default_reader(iterable):
: """Return a csv.reader with our default options."""
: return csv.reader(iterable, **DEFAULT_KARGS)
: ...
Using mysqlimport.
: mysqlimport \
: --user=$MYSQL_USERNAME \
: --password=$MYSQL_PASSWORD \
: --columns=id,name \
: --fields-optionally-enclosed-by='"' \
: --fields-terminated-by='\t' \
: --fields-escaped-by='' \
: --lines-terminated-by='\n' \
: --local \
: --lock-tables \
: --replace \
: --verbose \
: $DATABASE ${BUILD}/sometable.tsv
To see warnings:
http://jjinux.blogspot.com/2009/03/mysql-encoding-hell.html

Show pdb in the context of a web app:
: import pdb
: from pprint import pprint
: pdb.set_trace()
: pprint(request.environ)
http://localhost:5000/api/ratio

Friday, April 17, 2009

Hardware: Lexar USB Flash Drive

I have a tiny 1G Lexar USB Flash drive. I think it's a Lexar JumpDrive FireFly. It accidentally took a trip through my Swedish front-loading washing machine ;)

It came out looking perfectly dry. The cover didn't look like it had any water in it. I popped it in, and it worked fine ;)

Saturday, April 11, 2009

Programming: 50 in 50

This is one of the coolest, most creative talks I've seen all year. "Guy L. Steele and Richard P. Gabriel gave a presentation about languages and language constructs in a presentation that is a work of art in itself." If you love programming languages even half as much as I do, this is a must see!

Oz: Don't See it Yet? Tell Me More!

I've been playing with the programming language Oz lately. It has an interactive interface. Conceptually, it's a lot like the Python shell. However, when you "Browse" (i.e. output) values, it outputs them to a second window called the "Browser". I could never figure out why a separate window was needed until just the other day.

If you feed the following into the interactive interface, it'll block (i.e. it won't show anything):
declare A B C in
C=A+B
{Browse C}
However, if you feed it the following:
A=10
B=200
It'll display 210 in the browser. It knows that it can't display C until it figures out values for A and B.

Ok, that's kind of interesting, but it gets weirder! If you feed the following into the interactive interface, it'll show "D" in the browser:
declare D in
{Browse D}
Now, if you go back and actually bind D to a value:
D=10
It'll change the D to a 10 in the browser. Crazy!

Tuesday, April 07, 2009

Apple: Broken Keyboard: Apple to the Rescue!

I left the gate to my office open, and when I went inside, I noticed my Return key on my MacBook was broken. I think my 1.5 year old got to it.

I read online that it's best to just bring it in to the Apple Store. It turned out that there was a small plastic part that was broken. They have a drawer full of keyboards which they use to replace broken keys. However, the Apple Store near my house didn't have the right keyboard.

I went to five Apple stores in all, and none of them had the right type of keyboard. Somehow, my 2008 MacBook is different than everyone else's. The person at the fifth store checked, and my warranty had expired just two weeks earlier.

The guy at the "Genius Bar" said, "I'm not going to petty. We'll fix it. We'll just replace the whole top case." The top case includes the touch pad, etc. and costs about $150. I gave him my laptop immediately, and picked it up four days later.

In the past (about five years ago), I had some really bad experiences with Apple support, but this experience was radically different in a whole range of ways. All I can say is:

Apple, thank you!

Friday, April 03, 2009

Oz: Variable to Variable Binding

Just like in Haskell, in Oz you can do the following:
declare A B C
A=B
B=C
C=3
{Browse A}
This shows the value 3. Crazy!

Wednesday, April 01, 2009

Personal: My ICQ is Dead

My ICQ account is dead, and I am unable to retrieve the password. Please use "jjinux" on Yahoo, AIM, GTalk, or Jabber.