pickles2/px2-px2dthelper v2.2.1 API Document

simple_html_dom.php

Website: http://sourceforge.net/projects/simplehtmldom/ Acknowledge: Jose Solorzano (https://sourceforge.net/projects/php-html/) Contributions by: Yousuke Kumakura (Attribute filters) Vadim Voituk (Negative indexes supports of "find" method) Antcs (Constructor with automatically load contents either text or file/url)

all affected sections have comments starting with "PaperG"

Paperg - Added case insensitive testing of the value of the selector. Paperg - Added tag_start for the starting index of tags - NOTE: This works but not accurately. This tag_start gets counted AFTER \r\n have been crushed out, and after the remove_noice calls so it will not reflect the REAL position of the tag in the source, it will almost always be smaller by some amount. We use this to determine how far into the file the tag in question is. This "percentage will never be accurate as the $dom->size is the "real" number of bytes the dom was created from. but for most purposes, it's a really good estimation. Paperg - Added the forceTagsClosed to the dom constructor. Forcing tags closed is great for malformed html, but it CAN lead to parsing errors. Allow the user to tell us how much they trust the html. Paperg add the text and plaintext to the selectors for the find syntax. plaintext implies text in the innertext of a node. text implies that the tag is a text node. This allows for us to find tags based on the text they contain. Create find_ancestor_tag to see if a tag is - at any level - inside of another specific tag. Paperg: added parse_charset so that we know about the character set of the source document. NOTE: If the user's system has a routine called get_last_retrieve_url_contents_content_type availalbe, we will assume it's returning the content-type header from the last transfer or curl_exec, and we will parse that and use it in preference to any other method of charset detection.

Found infinite loop in the case of broken html in restore_noise. Rewrote to protect from that. PaperG (John Schlick) Added get_display_size for "IMG" tags.

Licensed under The MIT License Redistributions of files must retain the above copyright notice.

Interfaces, Classes, Traits and Enums

simple_html_dom_node: simple html dom node PaperG - added ability for "find" routine to lowercase the value of the selector.
simple_html_dom: simple html dom parser Paperg - in the find routine: allow us to specify that we want case insensitive testing of the value of the selector.

DEFAULT_BR_TEXT = "\r\n"
DEFAULT_SPAN_TEXT = " "
DEFAULT_TARGET_CHARSET = 'UTF-8'
HDOM_INFO_BEGIN = 0
HDOM_INFO_END = 1
HDOM_INFO_ENDSPACE = 7
HDOM_INFO_INNER = 5
HDOM_INFO_OUTER = 6
HDOM_INFO_QUOTE = 2
HDOM_INFO_SPACE = 3
HDOM_INFO_TEXT = 4
HDOM_QUOTE_DOUBLE = 0
HDOM_QUOTE_NO = 3
HDOM_QUOTE_SINGLE = 1
HDOM_TYPE_COMMENT = 2
HDOM_TYPE_ELEMENT = 1
HDOM_TYPE_ENDTAG = 4
HDOM_TYPE_ROOT = 5
HDOM_TYPE_TEXT = 3
HDOM_TYPE_UNKNOWN = 6
file_get_html() : mixed
str_get_html() : mixed
dump_html_tree() : mixed

DEFAULT_BR_TEXT


    public
        mixed
    DEFAULT_BR_TEXT
    = "\r\n"

DEFAULT_SPAN_TEXT


    public
        mixed
    DEFAULT_SPAN_TEXT
    = " "

DEFAULT_TARGET_CHARSET


    public
        mixed
    DEFAULT_TARGET_CHARSET
    = 'UTF-8'

HDOM_INFO_BEGIN


    public
        mixed
    HDOM_INFO_BEGIN
    = 0

HDOM_INFO_END


    public
        mixed
    HDOM_INFO_END
    = 1

HDOM_INFO_ENDSPACE


    public
        mixed
    HDOM_INFO_ENDSPACE
    = 7

HDOM_INFO_INNER


    public
        mixed
    HDOM_INFO_INNER
    = 5

HDOM_INFO_OUTER


    public
        mixed
    HDOM_INFO_OUTER
    = 6

HDOM_INFO_QUOTE


    public
        mixed
    HDOM_INFO_QUOTE
    = 2

HDOM_INFO_SPACE


    public
        mixed
    HDOM_INFO_SPACE
    = 3

HDOM_INFO_TEXT


    public
        mixed
    HDOM_INFO_TEXT
    = 4

HDOM_QUOTE_DOUBLE


    public
        mixed
    HDOM_QUOTE_DOUBLE
    = 0

HDOM_QUOTE_NO


    public
        mixed
    HDOM_QUOTE_NO
    = 3

HDOM_QUOTE_SINGLE


    public
        mixed
    HDOM_QUOTE_SINGLE
    = 1

HDOM_TYPE_COMMENT


    public
        mixed
    HDOM_TYPE_COMMENT
    = 2

HDOM_TYPE_ELEMENT


    public
        mixed
    HDOM_TYPE_ELEMENT
    = 1

HDOM_TYPE_ENDTAG


    public
        mixed
    HDOM_TYPE_ENDTAG
    = 4

HDOM_TYPE_ROOT


    public
        mixed
    HDOM_TYPE_ROOT
    = 5

HDOM_TYPE_TEXT


    public
        mixed
    HDOM_TYPE_TEXT
    = 3

HDOM_TYPE_UNKNOWN


    public
        mixed
    HDOM_TYPE_UNKNOWN
    = 6

file_get_html()


    
                    file_get_html(mixed $url[, mixed $use_include_path = false ][, mixed $context = null ][, mixed $offset = -1 ][, mixed $maxLen = -1 ][, mixed $lowercase = true ][, mixed $forceTagsClosed = true ][, mixed $target_charset = DEFAULT_TARGET_CHARSET ][, mixed $stripRN = true ][, mixed $defaultBRText = DEFAULT_BR_TEXT ][, mixed $defaultSpanText = DEFAULT_SPAN_TEXT ]) : mixed

Parameters

$url : mixed
$use_include_path : mixed = false
$context : mixed = null
$offset : mixed = -1
$maxLen : mixed = -1
$lowercase : mixed = true
$forceTagsClosed : mixed = true
$target_charset : mixed = DEFAULT_TARGET_CHARSET
$stripRN : mixed = true
$defaultBRText : mixed = DEFAULT_BR_TEXT
$defaultSpanText : mixed = DEFAULT_SPAN_TEXT

Return values

mixed —

str_get_html()


    
                    str_get_html(mixed $str[, mixed $lowercase = true ][, mixed $forceTagsClosed = true ][, mixed $target_charset = DEFAULT_TARGET_CHARSET ][, mixed $stripRN = true ][, mixed $defaultBRText = DEFAULT_BR_TEXT ][, mixed $defaultSpanText = DEFAULT_SPAN_TEXT ]) : mixed

Parameters

$str : mixed
$lowercase : mixed = true
$forceTagsClosed : mixed = true
$target_charset : mixed = DEFAULT_TARGET_CHARSET
$stripRN : mixed = true
$defaultBRText : mixed = DEFAULT_BR_TEXT
$defaultSpanText : mixed = DEFAULT_SPAN_TEXT

Return values

mixed —

dump_html_tree()


    
                    dump_html_tree(mixed $node[, mixed $show_attr = true ], mixed $deep) : mixed

Parameters

$node : mixed
$show_attr : mixed = true
$deep : mixed

Return values

mixed —

pickles2/px2-px2dthelper v2.2.1 API Document

simple_html_dom.php

Tags

Interfaces, Classes, Traits and Enums

Table of Contents

Constants

DEFAULT_BR_TEXT

DEFAULT_SPAN_TEXT

DEFAULT_TARGET_CHARSET

HDOM_INFO_BEGIN

HDOM_INFO_END

HDOM_INFO_ENDSPACE

HDOM_INFO_INNER

HDOM_INFO_OUTER

HDOM_INFO_QUOTE

HDOM_INFO_SPACE

HDOM_INFO_TEXT

HDOM_QUOTE_DOUBLE

HDOM_QUOTE_NO

HDOM_QUOTE_SINGLE

HDOM_TYPE_COMMENT

HDOM_TYPE_ELEMENT

HDOM_TYPE_ENDTAG

HDOM_TYPE_ROOT

HDOM_TYPE_TEXT

HDOM_TYPE_UNKNOWN

Functions

file_get_html()

Parameters

Return values

str_get_html()

Parameters

Return values

dump_html_tree()

Parameters

Return values

Search results