Consuming HTML
The HTML out in the wild may be messy, but it is of vital importance.
Don't use HTMLParser. minidom is horrible. Beautiful Soup is nicer. html5lib is theoretically fantastic, but it's very slow. libxml is really nice. It's similar to html5lib, but way, way faster.
Friday, March 21, 2008
Subscribe to:
Post Comments (Atom)


0 comments:
Post a Comment