[article AutoIndex [quickbook 1.4] [copyright 2008 John Maddock] [license Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at [@http://www.boost.org/LICENSE_1_0.txt]) ] [authors [Maddock, John]] [/last-revision $Date: 2008-11-04 17:11:53 +0000 (Tue, 04 Nov 2008) $] ] [section:overview Overview] AutoIndex is a tool for taking the grunt work out of indexing a Quickbook\/Boostbook\/Docbook document that describes C\/C++ code. Traditionally, in order to index a Docbook document you would have to manually add a large amount of `` markup: in fact one `` for each occurance of each term to be indexed. Instead AutoIndex will scan one or more C\/C++ header files and extract all the ['function], ['class], ['macro] and ['typedef] names that are defined by those headers, and then insert the ``'s into the XML document for you. AutoIndex creates index entries as follows - for each occurance of each search term, it creates two index entries - one has the search term as the primary index key and the title of the section it appears in as a subterm, the other has the section title as the main index entry and the search term as the subentry. Thus the user has two chances to find what their looking for, based upon either the section name or the ['function], ['class], ['macro] or ['typedef] name. So for example in Boost.Math the class name `students_t_distribution` has a primary entry that lists all sections it appears in: [$../students_t_eg_1.png] Then those sections also have primary entries, which list all the search terms those sections contain: [$../students_t_eg_2.png] Of course these automated index entries may not be quite what you're looking for: often you'll get a few spurious entries, a few missing entries, and a few entries where the section name used as an index entry is less than ideal. So AutoIndex provides some powerful regular expression based rules that allow you to add, remove, constrain, or rewrite entries. Normally just a few lines in AutoIndex's script file are enough to tailor the output to match the authors expectations. AutoIndex also supports multiple indexes (as does Docbook), and since it knows which search terms are ['function], ['class], ['macro] or ['typedef] names, it can add the necessary attritubes to the XML so that you can have separate indexes for each of these different types. These specialised indexes only contain entries for the ['function], ['class], ['macro] or ['typedef] names, ['section names] are never used as primary index terms here, unlike the main "include everything" index. Finally, while the Docbook XSL stylesheets create nice indexes complete with page numbers for PDF output, the HTML indexes look a lot less good, as these use section titles in place of page numbers... but as AutoIndex uses section titles as index entries this leads to a lot of repetition, so as an alternative AutoIndex can be instructed to construct the index itself. This is faster than using the XSL stylesheets, and now each index entry is a hyperlink to the approprate section: [$../students_t_eg_3.png] With internal index generation there is also a helpful navigation bar at the start of each Index: [$../students_t_eg_4.png] [endsect] [section:tut Getting Started and Tutorial] [h4 Step 1: Build the tool] cd into `tools/auto_index/build` and invoke bjam as: bjam release Optionally pass the name of the compiler toolset you want to use to bjam as well: bjam release gcc [h4 Step 2: Configure Boost.Build] TODO: we need BoostBook integration!!! Currently the tool can only be run manually. [h4 Step 3: Add indexes to your documentation] To add a single index to a BoostBook\/Docbook document, then add `` at the location where you want the index to appear. The index will be rendered as a separate section when the documentation is built. To add multiple indexes, then give each one a title and set it's `type` attribute to specify which terms will be included, for example to place the ['function], ['class], ['macro] or ['typedef] names indexed by ['auto_index] in separate indexes along with a main "include everything" index as well, one could add: Class Index Typedef Index Function Index Macro Index In quickbook, you add the same markup but enclose it in an escape: '''''' [h4 Step 4: Create the script file] AutoIndex works by reading a script file that tells it what to index, at it's simplest it will scan one or more headers for terms that should be indexed in the documentation. So for example to scan "myheader.hpp" the script file would just contain: !scan myheader.hpp Or we can recursively scan through directories looking for all the files to scan whose name matches a particular regular expression: !scan-path "../../../../boost/math" ".*\.hpp" true Note how each argument is whitespace separated and can be optionally enclosed in "double quotes". The final ['true] argument indicates that subdirectories in `../../../../boost/math` should be searched in addition to that directory. Often the ['scan] or ['scan-path] rules will bring in too many terms to search for, so we need to be able to exclude terms as well: !exclude type Which excludes the term "type" from being indexed. We can also add terms manually: foobar will index occurances of "foobar" and: foobar \<\w*(foo|bar)\w*\> will index any whole word containing either "foo" or "bar" within it, this is useful when you want to index a lot of similar or related words under one entry, for example: reflex Will only index occurances of "reflex" as a whole word, but: reflex \ will index occurances of "reflex", reflexing" and "reflexed" all under the same entry ['reflex]. This inclusion rule can also restict the term to certain sections, and add an index category that the term should belong to (so it only appears in certain indexes). Finally the script can add rewrite rules, that rename section names that are automatically used as index entries. For example we might want to remove leading "A" or "The" prefixes from section titles when AutoIndex uses them as an index entry: !rewrite-name "(?i)(?:A|The)\s+(.*)" "\1" [h4 Step 5: Iterate] Creating a good index is an iterative process, often the first step is just to add a header scanning rule to the script file and then generate the documentation and see: * What's missing. * What's been included that shouldn't be. * What's been included under a poor name. Further rules can then be added to the script to handle these cases and the next iteration examined, and so on. [endsect] [section:script_ref Script File Reference] The following elements can occur in a script: [h4 Simple Inclusions] term [regular-expression1 [regular-expression2 [category]]] [variablelist [[term][The term to index: this will form a primary entry in the Index with the section title(s) containing the term as secondary entries, and also will be used as a secondary entry beneath each of the section titles that the term occurs in.]] [[regular-expression1][An optional regular expression: each occurance of the regular expression in the text of the document will result in one index term being emitted. If the regular expression is omitted or is "", then the ['term] itself will be used as the search text - and only occurance of whole words matching ['term] will be indexed.]] [[regular-expression2][A constraint that specifies which sections are indexed for ['term]: only if the ID of the section matches ['regular-expression2] exactly will that section be indexed for occurances of ['term]. For example: `myclass "" "mylib.examples.*"` Will index occurances of "myclass" as a whole word only in sections whose ID begins "mylib.examples", while: `myclass "" "(?!mylib.introduction.*).*"` will index occurances of "myclass" in any section, except those whose ID's begin "mylib.introduction".]] [[category][Optionally an index category to place occurances of ['term] in. If you have multiple indexes then this is the name assigned to the indexes "type" attribute. ]] ] [h4 Source File Scanning] !scan source-file-name Scans the C\/C++ source file ['source-file-name] for definitions of ['function]'s, ['class]'s, ['macro]'s or ['typedef]'s and makes each of these a term to be indexed. Terms found are assigned to the index category "function_name", "class_name", "macro_name" or "typedef_name" depending on how they were seen in the source file. These may then be included in a specialised index whose "type" attribute has the same category name. [h4 Directory and Source File Scanning] !scan-path directory-name file-name-regex [recurse] [variablelist [[directory-name][The directory to scan: this should be a path relative to the script file and should use all forward slashes in it's file name.]] [[file-name-regex][A regular expression: any file in the directory whose name matches the regular expression will be scanned for terms to index.]] [[recurse][An optional boolian value - either "true" or "false" - that indicates whether to recurse into subdirectories.]] ] [h4 Excluding Terms] !exclude term-list Excludes all the terms in whitespace separated ['term-list] from being indexed. This should be placed /after/ any ['!scan] or ['!scan-path] rules which may result in the terms becoming included. [h4 Rewriting Section Names] !rewrite-id regular-expression new-name [variablelist [[regular-expression][A regular expression: all section ID's that match the expression exactly will have index entries ['new-name] instead of their title(s).]] [[new-name][The name that the section will appear under in the index.]] ] !rewrite-name regular-expression format-text [variablelist [[regular-expression][A regular expression: all sections whose titles match the regular expression exactly, will have index entries composed of the regular expression match combined with the regex format string ['format-text].]] [[format-text][The Perl-style format string used to reformat the title.]] ] [endsect] [section:comm_ref Command Line Reference] The following command line options are supported by auto_index: [variablelist [[in=infilename][Specifies the name of the XML input file to be indexed.]] [[out=outfilename][Specifies the name of the new XML file to create.]] [[scan=source-filename][Specifies that ['source-filename] should be scanned for terms to index.]] [[script=script-filename][Specifies the name of the script file to process.]] [[--no-duplicates][If a term occurs more than once in the same section, then include only one index entry.]] [[--internal-index][Specifies that auto_index should generate the actual indexes rather than inserting ``'s and leaving index generation to the XSL stylesheets.]] ] [endsect]