Defining Scopes

Defining Custom Scopes

  • Custom scope tags can be created in HFL using the 'defscope' declaration, which defines keywords that are to be treated as scope tags.
  • Once a scope tag is declared, it can be used in the same way as any of the pre-defined scope tags such as the tag for 'p'.
  • It also is the way in which special scopes such as pattern-based scopes can be defined, along with their regex patterns that are used to delineate them.
  • Declaring a scope tag is necessary in order for the HFL parser to know when an unknown identifier should be treated as a scope, and not as a property. This is important, because they are processed in fundamentally different ways:
    • When an unknown identifier is used using scope syntax, it will usually be recognized as a scope tag, such as the keyword 'item' in:
      • item:all { .... }

    • But when the identifier is not used in scope syntax, the lexer may not recognize it as a scope tag, such as the keywords 'header' and 'body_content' in:
      • cut {title header? body_content}

  • Because many identifiers such as 'title', 'header' etc. have the same syntactic form as a property, the parser will mark them as property names unless they are pre-declared as scope tags using a 'defscope' declaration.
  • SYNTAX:
  • defscope <tag-spec> [, <tag-spec>]* ;
     
    where:
    <tag-spec>
    :=
    <tag-dcl> | <patt-tag-dcl> | <sep-tag-dcl>
     
    <tag-dcl>
    :=
    <tag-name> [([<tag-options>])]
    <patt-tag-dcl>
    :=
    <tag-name> ([<tag-options>] patt, <regex-expr>
    [, <patt-opts>])
    <sep-tag-dcl>
    :=
    <tag-name> ([<tag-options>] sep, <regex-expr>)
     
    <tag-options>
    :=
    [<alias-tag>,] [<tag-types>,]
    <tag-types>
    :=
    <tag-type> [, <tag-type>]*
    <tag-type>
    :=
    raw | block | inline | unpaired | generic
    | builtin
    <patt-opts>
    :=
    <use-capture> [, <max-ch>]
    <regex-expr>
    :=
    <inline-regex> | <regex-object>
     
    <inline-regex>
    :=
    /<regex-patt>/[<regex-flags>]
    <regex-patt>
    :=
    regular-expression pattern characters
    <regex-flags>
    :=
    <regex-flag>+
    <regex-flag>
    :=
    s
    // Regex s: Treat match as a single line
    |
    m
    // Regex m: Match on multiple lines
    |
    i
    // Regex i: Match case-insensitive
    |
    b
    // Regex ^: Match beginning of parent scope
    |
    e
    // Regex $: Match end of parent scope
     
    <use-capture>
    :=
    Only match contents of <use-capture>-th capture group
    <max-ch>
    :=
    maximum num of chars that can match pattern
    • There are three types of tags that can be declared:
      • Normal tags, which match keywords in HTML-tag format. An example would be a declaration of tag 'body_content' would match the HTML tag <body_content> in application code.
      • Pattern-based tags, declared with the 'patt' type option, which scan the text using a regular-expression pattern (or 'regex' pattern) to identify specific character sequences to include in the tag's scope. An example of this would be the 'sentence' tag which matches characters up to the next period character.
      • Separator-based tags, declared with the 'sep' type option, which scan the text using a regex pattern to identify specific character sequences to use as separators between two scopes of that type. An example of this would be to use a newline character to separate paragraphs in the 'p-nl' tag.
    • The <alias-tag> specifies that when the tag matches, the final tag assigned will be <alias-tag>. It also specifies that the scope-level of the new tag will have the same level as the alias.
    • The <tag-type> options control how the scope of the tag is to be interpreted.
      • The unpaired option specifies that the tag has no closing tag (for example, the <img> HTML tag). Declaring a tag as 'unpaired' can be useful when a tag has no inner content, as it can be much easier to type and the code is less cluttered.
      • The 'raw' option specifies that the contents of the tag are to be treated as raw text and that any HTML delimiters are not to be parsed into structure.
      • The 'block' and 'inline' options tell HFL to treat the scope during layout as if it has the the CSS 'display: block' and 'display: inline' properties, respectively.
      • The 'builtin' option tells HFL to treat the tag as it would a built-in HTML tag, that is to leave the name unmodified on output instead of changing it to a <div> or <span>, which is the default.
      • The 'generic' option tells HFL to create a tag that can be found using scope specifications, but does not generate an HTML tag on output. This can be useful for marking content without creating extra HTML layers.
    • The <regex-patt> regex patterns used in 'patt' and 'sep' declarations follow the rules of regex patterns used in the Perl language. For more details about Perl's regex expression syntax and usage, consult the Perl documentation which can be found at http://perldoc.perl.org/perlre.html
    • The <regex-patt> arguments must be valid regex objects, either as in-line regex expressions, or as expressions that evaluate to regex objects. For example, a regex object can be created using a 'var' variable declaration or constructed using 'new RegExp()'.
    • The <use-capture> option in patt declarations restricts the characters captured into a scope to be those that match a specific group within the regex expression. This allows a scope pattern to contain characters outside of those that belong in the scope in order to identify the scope properly, but not include them in the scope's character range.
      • To use this option, the portion of the regex pattern containing the actual characters to be included in the scope is enclosed within a regex grouping (between '(' and ')' characters), and then the <use-capture> value is set to that regex group number.
    • The <max-ch> option in patt declarations limits how many characters will be included in the scope range during the matching process. When that limit is exceeded, the match will fail.
      • This is useful in cases where a pattern might enclose an unacceptably large number of characters, which would generally indicate that the match was probably bogus. An example would be to limit the number of characters that would be included in 'lead-in emphasis' pattern at the start of a paragraph of text.
      • NOTE: Separator-based scopes that are defined using regex patterns are less efficient than those that use simple strings. So running as a server-side program, string patterns should be used instead of regex patterns where possible.
    • EXAMPLES:
    • // Declare that 'body_content' is an alias for (of type) 'div'
      defscope body_content(div);

      // Define a patt scope that finds leading 'outline' characters
      defscope num_lead(patt, /^^\s*(\w+\.)\s/);

      // Declare some tags as 'raw', and others as 'unpaired'
      defscope code_block(p, raw), code(p, raw),
        syntax_block(div, raw), syntax(p, raw), 
        link2(unpaired), prev_page(unpaired), next_page(unpaired);

      // Declare tag <nav> to be treated as a built-in HTML tag
      defscope nav(builtin);

      // Create a scope of words contained within quotes, but exclude the quotes
      defscope quoted_word(patt, /'(\w+)'/, 1);

      // Identify the lead-in characters of a paragraph, up to 80 chars
      var max_leadin_chars = 80;
      var szRe = /^^\s*((?:\w|\-|\,|\.|\'|\"|\x91|\x92|\x93|\x94)+\s*)+(\:|\?)/;
      defscope lead_in(span, patt, szRe, 0, max_leadin_chars);

      // Define scope 'topic' that is separated by lines with 3 dashes
      defscope topic_nl_d(topic, sep, "\n---\n");

      // Create a 'col' scope that is separated by two or more whitespace chars
      defscope col_ws_d(col, sep, /\s\s+/);

    • Behavior of the scope tags
    • Scope matches of normal tags and separator tags will result in scopes that are single cells, because they match or create HTML tags.
    • Scope matches of pattern-based scopes can result in the scopes being a range of cells in the form of any array, rather than just a single cell. This may be suitable for applying simple properties such as color. But for an action that requires that the match be a single cell the <alias-tag> must be specified. This will cause the match to be a fully tagged scope, which can be found easily later using the <alias-tag> tag.
    • NOTE: One consequence of a scope match being multiple cells is that the pseudo-variable '$cell_num' will only return the value of the first cell of the pattern match if the tag does not have a <alias-tag>. In such cases, to get the full range it is necessary to use pseudo-variable '$cell_range'.
    • NOTE: The 'defscope' declaration for a tag with an 'unpaired' option must execute before any files that try to use that tag are loaded, because it affects how the parser reads the files. This usually means that such declarations must be made in an option file that loads via the command line '-opt' option.
Previous: HFL Scopes Next: Variables