Python: module HTMLParser

Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.stsci.edu/spst/UnixTransition/doc/HTMLParser.html
Дата изменения: Thu Nov 5 13:46:15 2015
Дата индексирования: Mon Apr 11 00:37:34 2016
Кодировка:

HTMLParser

index
/usr/local/Python-2.5/lib/python2.5/HTMLParser.py
Module Docs

A parser for HTML and XHTML.

Modules

markupbase
re

Classes



exceptions.Exception(exceptions.BaseException)

HTMLParseError

markupbase.ParserBase

HTMLParser

class HTMLParseError(exceptions.Exception)

    Exception raised for all parse errors.

Method resolution order:

HTMLParseError

exceptions.Exception

exceptions.BaseException

__builtin__.object

Methods defined here:

__init__(self, msg, position=(None, None))

__str__(self)

Data descriptors defined here:

__weakref__

list of weak references to the object (if defined)

Data and other attributes inherited from exceptions.Exception:

__new__ = <built-in method __new__ of type object at 0x2096c0>
T.__new__(S, ...) -> a new object with type S, a subtype of T

Methods inherited from exceptions.BaseException:

__delattr__(...)
x.__delattr__('name') <==> del x.name

__getattribute__(...)
x.__getattribute__('name') <==> x.name

__getitem__(...)
x.__getitem__(y) <==> x[y]

__reduce__(...)

__repr__(...)
x.__repr__() <==> repr(x)

__setattr__(...)
x.__setattr__('name', value) <==> x.name = value

__setstate__(...)

Data descriptors inherited from exceptions.BaseException:

__dict__

args

message

exception message

class HTMLParser(markupbase.ParserBase)

    Find tags and other markup and call handler functions. Usage:     p = HTMLParser()     p.feed(data)     ...     p.close() Start tags are handled by calling handle_starttag() or handle_startendtag(); end tags by handle_endtag().  The data between tags is passed from the parser to the derived class by calling handle_data() with the data as argument (the data may be split up in arbitrary chunks).  Entity references are passed by calling handle_entityref() with the entity reference as the argument.  Numeric character references are passed to handle_charref() with the string containing the reference as the argument.

Methods defined here:

__init__(self)
Initialize and reset this instance.

check_for_whole_start_tag(self, i)
# Internal -- check to see if we have a complete starttag; return end # or -1 if incomplete.

clear_cdata_mode(self)

close(self)
Handle any buffered data.

error(self, message)

feed(self, data)
Feed data to the parser.         Call this as often as you want, with as little or as much text         as you want (may include ' ').

get_starttag_text(self)
Return full source of start tag: '<...>'.

goahead(self, end)
# Internal -- handle data as far as reasonable.  May leave state # and data to be processed by a subsequent call.  If 'end' is # true, force handling all data as if followed by EOF marker.

handle_charref(self, name)
# Overridable -- handle character reference

handle_comment(self, data)
# Overridable -- handle comment

handle_data(self, data)
# Overridable -- handle data

handle_decl(self, decl)
# Overridable -- handle declaration

handle_endtag(self, tag)
# Overridable -- handle end tag

handle_entityref(self, name)
# Overridable -- handle entity reference

handle_pi(self, data)
# Overridable -- handle processing instruction

handle_startendtag(self, tag, attrs)
# Overridable -- finish processing of start+end tag: <tag.../>

handle_starttag(self, tag, attrs)
# Overridable -- handle start tag

parse_endtag(self, i)
# Internal -- parse endtag, return end or -1 if incomplete

parse_pi(self, i)
# Internal -- parse processing instr, return end or -1 if not terminated

parse_starttag(self, i)
# Internal -- handle starttag, return end or -1 if not terminated

reset(self)
Reset this instance.  Loses all unprocessed data.

set_cdata_mode(self)

unescape(self, s)
# Internal -- helper to remove special character quoting

unknown_decl(self, data)

Data and other attributes defined here:

CDATA_CONTENT_ELEMENTS = ('script', 'style')

Methods inherited from markupbase.ParserBase:

getpos(self)
Return current line number and offset.

parse_comment(self, i, report=1)
# Internal -- parse comment, return length or -1 if not terminated

parse_declaration(self, i)
# Internal -- parse declaration (for use by subclasses).

parse_marked_section(self, i, report=1)
# Internal -- parse a marked section # Override this to handle MS-word extension syntax <![if word]>content<![endif]>

updatepos(self, i, j)
# Internal -- update line number and offset.  This should be # called for each piece of data exactly once, in order -- in other # words the concatenation of all the input strings to this # function should be exactly the entire input.

Data

attrfind = <_sre.SRE_Pattern object at 0x8f98e8>
charref = <_sre.SRE_Pattern object at 0x188048e0>
commentclose = <_sre.SRE_Pattern object at 0x1894abe8>
endendtag = <_sre.SRE_Pattern object at 0x189554a0>
endtagfind = <_sre.SRE_Pattern object at 0x1856a7f0>
entityref = <_sre.SRE_Pattern object at 0x183fe3e0>
incomplete = <_sre.SRE_Pattern object at 0x189461e0>
interesting_cdata = <_sre.SRE_Pattern object at 0x1894acb8>
interesting_normal = <_sre.SRE_Pattern object at 0x18953a20>
locatestarttagend = <_sre.SRE_Pattern object at 0x187874f0>
piclose = <_sre.SRE_Pattern object at 0x189554a0>
starttagopen = <_sre.SRE_Pattern object at 0x18951180>
tagfind = <_sre.SRE_Pattern object at 0x186085c0>

Data
		attrfind = <_sre.SRE_Pattern object at 0x8f98e8> charref = <_sre.SRE_Pattern object at 0x188048e0> commentclose = <_sre.SRE_Pattern object at 0x1894abe8> endendtag = <_sre.SRE_Pattern object at 0x189554a0> endtagfind = <_sre.SRE_Pattern object at 0x1856a7f0> entityref = <_sre.SRE_Pattern object at 0x183fe3e0> incomplete = <_sre.SRE_Pattern object at 0x189461e0> interesting_cdata = <_sre.SRE_Pattern object at 0x1894acb8> interesting_normal = <_sre.SRE_Pattern object at 0x18953a20> locatestarttagend = <_sre.SRE_Pattern object at 0x187874f0> piclose = <_sre.SRE_Pattern object at 0x189554a0> starttagopen = <_sre.SRE_Pattern object at 0x18951180> tagfind = <_sre.SRE_Pattern object at 0x186085c0>