Skip to main content

28.4.2 Tags Tables

A tags table records the tags1 extracted by scanning the source code of a certain program or a certain document. Tags extracted from generated files reference the original files, rather than the generated files that were scanned during tag extraction. Examples of generated files include C files generated from Cweb source files, from a Yacc parser, or from Lex scanner definitions; .i preprocessed C files; and Fortran files produced by preprocessing .fpp source files.

To produce a tags table, you run the etags shell command on a document or the source code file. The β€˜etags’ program writes the tags to a tags table file, or tags file in short. The conventional name for a tags file is TAGS. See Create Tags Table. (It is also possible to create a tags table by using one of the commands from other packages that can produce such tables in the same format.)

Emacs uses the tags tables via the etags package as one of the supported backends for xref. Because tags tables are produced by the etags command that is part of an Emacs distribution, we describe tags tables in more detail here.

The Ebrowse facility is similar to etags but specifically tailored for C++. See Ebrowse in Ebrowse User’s Manual. The Semantic package provides another way to generate and use tags, separate from the etags facility. See Semantic.

β€’ Tag SyntaxΒ Β Tag syntax for various types of code and text files.
β€’ Create Tags TableΒ Β Creating a tags table with etags.
β€’ Etags RegexpsΒ Β Create arbitrary tags using regular expressions.

28.4.2.1 Source File Tag Syntax​

Here is how tag syntax is defined for the most popular languages:

  • In C code, any C function or typedef is a tag, and so are definitions of struct, union and enum. #define macro definitions, #undef and enum constants are also tags, unless you specify β€˜--no-defines’ when making the tags table. Similarly, global variables are tags, unless you specify β€˜--no-globals’, and so are struct members, unless you specify β€˜--no-members’. Use of β€˜--no-globals’, β€˜--no-defines’ and β€˜--no-members’ can make the tags table file much smaller.

    You can tag function declarations and external variables in addition to function definitions by giving the β€˜--declarations’ option to etags.

  • In C++ code, in addition to all the tag constructs of C code, member functions are also recognized; member variables are also recognized, unless you use the β€˜--no-members’ option. operator definitions have tag names like β€˜operator+’. If you specify the β€˜--class-qualify’ option, tags for variables and functions in classes are named β€˜class::variable’ and β€˜class::function’. By default, class methods and members are not class-qualified, which allows to identify their names in the sources more accurately.

  • In Java code, tags include all the constructs recognized in C++, plus the interface, extends and implements constructs. Tags for variables and functions in classes are named β€˜class.variable’ and β€˜class.function’.

  • In LaTeX documents, the arguments for \chapter, \section, \subsection, \subsubsection, \eqno, \label, \ref, \cite, \bibitem, \part, \appendix, \entry, \index, \def, \newcommand, \renewcommand, \newenvironment and \renewenvironment are tags.

    Other commands can make tags as well, if you specify them in the environment variable TEXTAGS before invoking etags. The value of this environment variable should be a colon-separated list of command names. For example,

    TEXTAGS="mycommand:myothercommand"
    export TEXTAGS

    specifies (using Bourne shell syntax) that the commands β€˜\mycommand’ and β€˜\myothercommand’ also define tags.

  • In Lisp code, any function defined with defun, any variable defined with defvar or defconst, and in general the first argument of any expression that starts with β€˜(def’ in column zero is a tag. As an exception, expressions of the form (defvar foo) are treated as declarations, and are only tagged if the β€˜--declarations’ option is given.

  • In Scheme code, tags include anything defined with def or with a construct whose name starts with β€˜def’. They also include variables set with set! at top level in the file.

Several other languages are also supported:

  • In Ada code, functions, procedures, packages, tasks and types are tags. Use the β€˜--packages-only’ option to create tags for packages only.

    In Ada, the same name can be used for different kinds of entity (e.g., for a procedure and for a function). Also, for things like packages, procedures and functions, there is the spec (i.e., the interface) and the body (i.e., the implementation). To make it easier to pick the definition you want, Ada tag names have suffixes indicating the type of entity:

    β€˜/b’​

    package body.

    β€˜/f’​

    function.

    β€˜/k’​

    task.

    β€˜/p’​

    procedure.

    β€˜/s’​

    package spec.

    β€˜/t’​

    type.

    Thus, M-x find-tag RET bidule/b RET will go directly to the body of the package bidule, while M-x find-tag RET bidule RET will just search for any tag bidule.

  • In assembler code, labels appearing at the start of a line, followed by a colon, are tags.

  • In Bison or Yacc input files, each rule defines as a tag the nonterminal it constructs. The portions of the file that contain C code are parsed as C code.

  • In Cobol code, tags are paragraph names; that is, any word starting in column 8 and followed by a period.

  • In Erlang code, the tags are the functions, records and macros defined in the file.

  • In Fortran code, functions, subroutines and block data are tags.

  • In Go code, packages, functions, and types are tags.

  • In HTML input files, the tags are the title and the h1, h2, h3 headers. Also, tags are name= in anchors and all occurrences of id=.

  • In Lua input files, all functions are tags.

  • In makefiles, targets are tags; additionally, variables are tags unless you specify β€˜--no-globals’.

  • In Objective C code, tags include Objective C definitions for classes, class categories, methods and protocols. Tags for variables and functions in classes are named β€˜class::variable’ and β€˜class::function’.

  • In Pascal code, the tags are the functions and procedures defined in the file.

  • In Perl code, the tags are the packages, subroutines and variables defined by the package, sub, use constant, my, and local keywords. Use β€˜--globals’ if you want to tag global variables. Tags for subroutines are named β€˜package::sub’. The name for subroutines defined in the default package is β€˜main::sub’.

  • In PHP code, tags are functions, classes and defines. Vars are tags too, unless you use the β€˜--no-members’ option.

  • In PostScript code, the tags are the functions.

  • In Prolog code, tags are predicates and rules at the beginning of line.

  • In Python code, def or class at the beginning of a line generate a tag.

  • In Ruby code, def or class or module at the beginning of a line generate a tag. Constants also generate tags.

You can also generate tags based on regexp matching (see Etags Regexps) to handle other formats and languages.

28.4.2.2 Creating Tags Tables​

The etags program is used to create a tags table file. It knows the syntax of several languages, as described in Tag Syntax. Here is how to run etags:

etags inputfiles…

The etags program reads the specified files, and writes a tags table named TAGS in the current working directory. You can optionally specify a different file name for the tags table by using the β€˜--output=file’ option; specifying - as a file name prints the tags table to standard output. You can also append the newly created tags table to an existing file by using the β€˜--append’ option.

If the specified files don’t exist, etags looks for compressed versions of them and uncompresses them to read them. Under MS-DOS, etags also looks for file names like mycode.cgz if it is given β€˜mycode.c’ on the command line and mycode.c does not exist.

If the tags table becomes outdated due to changes in the files described in it, you can update it by running the etags program again. If the tags table does not record a tag, or records it for the wrong file, then Emacs will not be able to find that definition until you update the tags table. But if the position recorded in the tags table becomes a little bit wrong (due to other editing), Emacs will still be able to find the right position, with a slight delay.

Thus, there is no need to update the tags table after each edit. You should update a tags table when you define new tags that you want to have listed, or when you move tag definitions from one file to another, or when changes become substantial.

You can make a tags table include another tags table, by passing the β€˜--include=file’ option to etags. It then covers all the files covered by the included tags file, as well as its own.

If you specify the source files with relative file names when you run etags, the tags file will contain file names relative to the directory where the tags file was initially written. This way, you can move an entire directory tree containing both the tags file and the source files, and the tags file will still refer correctly to the source files. If the tags file is - or is in the /dev directory, however, the file names are made relative to the current working directory. This is useful, for example, when writing the tags to the standard output.

When using a relative file name, it should not be a symbolic link pointing to a tags file in a different directory, because this would generally render the file names invalid.

If you specify absolute file names as arguments to etags, then the tags file will contain absolute file names. This way, the tags file will still refer to the same files even if you move it, as long as the source files remain in the same place. Absolute file names start with β€˜/’, or with β€˜device:/’ on MS-DOS and MS-Windows.

When you want to make a tags table from a great number of files, you may have problems listing them on the command line, because some systems have a limit on its length. You can circumvent this limit by telling etags to read the file names from its standard input, by typing a dash in place of the file names, like this:

find . -name "*.[chCH]" -print | etags -

etags recognizes the language used in an input file based on its file name and contents. It first tries to match the file’s name and extension to the ones commonly used with certain languages. Some languages have interpreters with known names (e.g., perl for Perl or pl for Prolog), so etags next looks for an interpreter specification of the form β€˜#!interp’ on the first line of an input file, and matches that against known interpreters. If none of that works, or if you want to override the automatic detection of the language, you can specify the language explicitly with the β€˜--language=name’ option. You can intermix these options with file names; each one applies to the file names that follow it. Specify β€˜--language=auto’ to tell etags to resume guessing the language from the file names and file contents. Specify β€˜--language=none’ to turn off language-specific processing entirely; then etags recognizes tags by regexp matching alone (see Etags Regexps). This comes in handy when an input file uses a language not yet supported by etags, and you want to avoid having etags fall back on Fortran and C as the default languages.

The option β€˜--parse-stdin=file’ is mostly useful when calling etags from programs. It can be used (only once) in place of a file name on the command line. etags will read from standard input and mark the produced tags as belonging to the file file.

β€˜etags --help’ outputs the list of the languages etags knows, and the file name rules for guessing the language. It also prints a list of all the available etags options, together with a short explanation. If followed by one or more β€˜--language=lang’ options, it outputs detailed information about how tags are generated for lang.

28.4.2.3 Etags Regexps​

The β€˜--regex’ option to etags allows tags to be recognized by regular expression matching. You can intermix this option with file names; each one applies to the source files that follow it. If you specify multiple β€˜--regex’ options, all of them are used in parallel. The syntax is:

--regex=[{language}]/tagregexp/[nameregexp/]modifiers

The essential part of the option value is tagregexp, the regexp for matching tags. It is always used anchored, that is, it only matches at the beginning of a line. If you want to allow indented tags, use a regexp that matches initial whitespace; start it with β€˜[ \t]*’.

In these regular expressions, β€˜\’ quotes the next character, and all the C character escape sequences are supported: β€˜\a’ for bell, β€˜\b’ for back space, β€˜\e’ for escape, β€˜\f’ for formfeed, β€˜\n’ for newline, β€˜\r’ for carriage return, β€˜\t’ for tab, and β€˜\v’ for vertical tab. In addition, β€˜\d’ stands for the DEL character.

Ideally, tagregexp should not match more characters than are needed to recognize what you want to tag. If the syntax requires you to write tagregexp so it matches more characters beyond the tag itself, you should add a nameregexp, to pick out just the tag. This will enable Emacs to find tags more accurately and to do completion on tag names more reliably. In nameregexp, it is frequently convenient to use β€œback references" (see Regexp Backslash) to parenthesized groupings β€˜\( … \)’ in tagregexp. For example, β€˜\1’ refers to the first such parenthesized grouping. You can find some examples of this below.

The modifiers are a sequence of zero or more characters that modify the way etags does the matching. A regexp with no modifiers is applied sequentially to each line of the input file, in a case-sensitive way. The modifiers and their meanings are:

β€˜i’​

Ignore case when matching this regexp.

β€˜m’​

Match this regular expression against the whole file, so that multi-line matches are possible.

β€˜s’​

Match this regular expression against the whole file, and allow β€˜.’ in tagregexp to match newlines.

The β€˜-R’ option cancels all the regexps defined by preceding β€˜--regex’ options. It too applies to the file names following it. Here’s an example:

etags --regex=/reg1/i voo.doo --regex=/reg2/m \
bar.ber -R --lang=lisp los.er

Here etags chooses the parsing language for voo.doo and bar.ber according to their contents. etags also uses reg1 to recognize additional tags in voo.doo, and both reg1 and reg2 to recognize additional tags in bar.ber. reg1 is checked against each line of voo.doo and bar.ber, in a case-insensitive way, while reg2 is checked against the whole bar.ber file, permitting multi-line matches, in a case-sensitive way. etags uses only the Lisp tags rules, with no user-specified regexp matching, to recognize tags in los.er.

You can restrict a β€˜--regex’ option to match only files of a given language by using the optional prefix {language}. (β€˜etags --help’ prints the list of languages recognized by etags.) This is particularly useful when storing many predefined regular expressions for etags in a file. The following example tags the DEFVAR macros in the Emacs source files, for the C language only:

--regex='{c}/[ \t]*DEFVAR_[A-Z_ \t(]+"\([^"]+\)"/\1/'

When you have complex regular expressions, you can store the list of them in a file. The following option syntax instructs etags to read two files of regular expressions. The regular expressions contained in the second file are matched without regard to case.

--regex=@case-sensitive-file --ignore-case-regex=@ignore-case-file

A regex file for etags contains one regular expression per line. Empty lines, and lines beginning with space or tab are ignored. When the first character in a line is β€˜@’, etags assumes that the rest of the line is the name of another file of regular expressions; thus, one such file can include another file. All the other lines are taken to be regular expressions. If the first non-whitespace text on the line is β€˜--’, that line is a comment.

For example, we can create a file called β€˜emacs.tags’ with the following contents:

        -- This is for GNU Emacs C source files
{c}/[ \t]*DEFVAR_[A-Z_ \t(]+"\([^"]+\)"/\1/

and then use it like this:

etags --regex=@emacs.tags *.[ch] */*.[ch]

Here are some more examples. The regexps are quoted to protect them from shell interpretation.

  • Tag Octave files:

    etags --language=none \
    --regex='/[ \t]*function.*=[ \t]*\([^ \t]*\)[ \t]*(/\1/' \
    --regex='/###key \(.*\)/\1/' \
    --regex='/[ \t]*global[ \t].*/' \
    *.m

    Note that tags are not generated for scripts, so that you have to add a line by yourself of the form β€˜###key scriptname’ if you want to jump to it.

  • Tag Tcl files:

    etags --language=none --regex='/proc[ \t]+\([^ \t]+\)/\1/' *.tcl
  • Tag VHDL files:

    etags --language=none \
    --regex='/[ \t]*\(ARCHITECTURE\|CONFIGURATION\) +[^ ]* +OF/' \
    --regex='/[ \t]*\(ATTRIBUTE\|ENTITY\|FUNCTION\|PACKAGE\
    \( BODY\)?\|PROCEDURE\|PROCESS\|TYPE\)[ \t]+\([^ \t(]+\)/\3/'

  1. A tag is a synonym for identifier reference. Commands and features based on the etags package traditionally use β€œtag" with this meaning, and this subsection follows that tradition.↩