html5lib: html5lib (HTML parser based on the HTML5 specification)
html5lib:
html5lib: HTML parser designed to follow the HTML5 specification. The parser
html5lib: is designed to handle all flavours of HTML and parses invalid
html5lib: documents using well-defined error handling rules compatible with
html5lib: the behaviour of major desktop web browsers.
html5lib:
html5lib: Output is to a tree structure; the current release supports output
html5lib: to DOM, ElementTree and lxml tree formats as well as a simple
html5lib: custom format.
html5lib: