diff options
Diffstat (limited to 'ctags/docs/parser-html.html')
-rw-r--r-- | ctags/docs/parser-html.html | 135 |
1 files changed, 135 insertions, 0 deletions
diff --git a/ctags/docs/parser-html.html b/ctags/docs/parser-html.html new file mode 100644 index 0000000..0e7f6f5 --- /dev/null +++ b/ctags/docs/parser-html.html @@ -0,0 +1,135 @@ + +<!DOCTYPE html> + +<html> + <head> + <meta charset="utf-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" /> + + <title>The new HTML parser — Universal Ctags 0.3.0 documentation</title> + <link rel="stylesheet" type="text/css" href="_static/pygments.css" /> + <link rel="stylesheet" type="text/css" href="_static/classic.css" /> + + <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script> + <script src="_static/jquery.js"></script> + <script src="_static/underscore.js"></script> + <script src="_static/doctools.js"></script> + + <link rel="index" title="Index" href="genindex.html" /> + <link rel="search" title="Search" href="search.html" /> + <link rel="next" title="puppetManifest parser" href="parser-puppetManifest.html" /> + <link rel="prev" title="The new C/C++ parser" href="parser-cxx.html" /> + </head><body> + <div class="related" role="navigation" aria-label="related navigation"> + <h3>Navigation</h3> + <ul> + <li class="right" style="margin-right: 10px"> + <a href="genindex.html" title="General Index" + accesskey="I">index</a></li> + <li class="right" > + <a href="parser-puppetManifest.html" title="puppetManifest parser" + accesskey="N">next</a> |</li> + <li class="right" > + <a href="parser-cxx.html" title="The new C/C++ parser" + accesskey="P">previous</a> |</li> + <li class="nav-item nav-item-0"><a href="index.html">Universal Ctags 0.3.0 documentation</a> »</li> + <li class="nav-item nav-item-1"><a href="parsers.html" accesskey="U">Parsers</a> »</li> + <li class="nav-item nav-item-this"><a href="">The new HTML parser</a></li> + </ul> + </div> + + <div class="document"> + <div class="documentwrapper"> + <div class="bodywrapper"> + <div class="body" role="main"> + + <section id="the-new-html-parser"> +<span id="html"></span><h1>The new HTML parser<a class="headerlink" href="#the-new-html-parser" title="Permalink to this headline">¶</a></h1> +<dl class="field-list simple"> +<dt class="field-odd">Maintainer</dt> +<dd class="field-odd"><p>Jiri Techet <<a class="reference external" href="mailto:techet%40gmail.com">techet<span>@</span>gmail<span>.</span>com</a>></p> +</dd> +</dl> +<section id="introduction"> +<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2> +<p>The old HTML parser was line-oriented based on regular expression matching. This +brought several limitations like the inability of the parser to deal with tags +spanning multiple lines and not respecting HTML comments. In addition, the speed +of the parser depended on the number of regular expressions - the more tag types +were extracted, the more regular expressions were needed and the slower the +parser became. Finally, parsing of embedded JavaScript was very limited, based +on regular expressions and detecting only function declarations.</p> +<p>The new parser is hand-written, using separated lexical analysis (dividing +the input into tokens) and syntax analysis. The parser has been profiled and +optimized for speed so it is one of the fastest parsers in Universal Ctags. +It handles HTML comments correctly and in addition to existing tags it extracts +also <h1>, <h2> and <h3> headings. It should be reasonably simple to add new +tag types.</p> +<p>Finally, the parser uses the new functionality of Universal Ctags to use another +parser for parsing other languages within a host language. This is used for +parsing JavaScript within <script> tags and CSS within <style> tags. This +simplifies the parser and generates much better results than having a simplified +JavaScript or CSS parser within the HTML parser. To run JavaScript and CSS parsers +from HTML parser, use <cite>--extras=+g</cite> option.</p> +</section> +</section> + + + <div class="clearer"></div> + </div> + </div> + </div> + <div class="sphinxsidebar" role="navigation" aria-label="main navigation"> + <div class="sphinxsidebarwrapper"> + <h3><a href="index.html">Table of Contents</a></h3> + <ul> +<li><a class="reference internal" href="#">The new HTML parser</a><ul> +<li><a class="reference internal" href="#introduction">Introduction</a></li> +</ul> +</li> +</ul> + + <h4>Previous topic</h4> + <p class="topless"><a href="parser-cxx.html" + title="previous chapter">The new C/C++ parser</a></p> + <h4>Next topic</h4> + <p class="topless"><a href="parser-puppetManifest.html" + title="next chapter">puppetManifest parser</a></p> +<div id="searchbox" style="display: none" role="search"> + <h3 id="searchlabel">Quick search</h3> + <div class="searchformwrapper"> + <form class="search" action="search.html" method="get"> + <input type="text" name="q" aria-labelledby="searchlabel" /> + <input type="submit" value="Go" /> + </form> + </div> +</div> +<script>$('#searchbox').show(0);</script> + </div> + </div> + <div class="clearer"></div> + </div> + <div class="related" role="navigation" aria-label="related navigation"> + <h3>Navigation</h3> + <ul> + <li class="right" style="margin-right: 10px"> + <a href="genindex.html" title="General Index" + >index</a></li> + <li class="right" > + <a href="parser-puppetManifest.html" title="puppetManifest parser" + >next</a> |</li> + <li class="right" > + <a href="parser-cxx.html" title="The new C/C++ parser" + >previous</a> |</li> + <li class="nav-item nav-item-0"><a href="index.html">Universal Ctags 0.3.0 documentation</a> »</li> + <li class="nav-item nav-item-1"><a href="parsers.html" >Parsers</a> »</li> + <li class="nav-item nav-item-this"><a href="">The new HTML parser</a></li> + </ul> + </div> + <div class="footer" role="contentinfo"> + © Copyright 2015, Universal Ctags Team. + Last updated on 11 Jun 2021. + Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.0.2. + </div> + </body> +</html>
\ No newline at end of file |