28.4.2 Tags Tables
A tags table records the tags1 extracted by scanning the source code of a certain program or a certain document. Tags extracted from generated files reference the original files, rather than the generated files that were scanned during tag extraction. Examples of generated files include C files generated from Cweb source files, from a Yacc parser, or from Lex scanner definitions; .i
preprocessed C files; and Fortran files produced by preprocessing .fpp
source files.
To produce a tags table, you run the etags
shell command on a document or the source code file. The βetags
β program writes the tags to a tags table file, or tags file in short. The conventional name for a tags file is TAGS
. See Create Tags Table. (It is also possible to create a tags table by using one of the commands from other packages that can produce such tables in the same format.)
Emacs uses the tags tables via the etags
package as one of the supported backends for xref
. Because tags tables are produced by the etags
command that is part of an Emacs distribution, we describe tags tables in more detail here.
The Ebrowse facility is similar to etags
but specifically tailored for C++. See Ebrowse in Ebrowse Userβs Manual. The Semantic package provides another way to generate and use tags, separate from the etags
facility. See Semantic.
β’ Tag Syntax | Β Β | Tag syntax for various types of code and text files. |
β’ Create Tags Table | Β Β | Creating a tags table with etags . |
β’ Etags Regexps | Β Β | Create arbitrary tags using regular expressions. |
28.4.2.1 Source File Tag Syntaxβ
Here is how tag syntax is defined for the most popular languages:
In C code, any C function or typedef is a tag, and so are definitions of
struct
,union
andenum
.#define
macro definitions,#undef
andenum
constants are also tags, unless you specify β--no-defines
β when making the tags table. Similarly, global variables are tags, unless you specify β--no-globals
β, and so are struct members, unless you specify β--no-members
β. Use of β--no-globals
β, β--no-defines
β and β--no-members
β can make the tags table file much smaller.You can tag function declarations and external variables in addition to function definitions by giving the β
--declarations
β option toetags
.In C++ code, in addition to all the tag constructs of C code, member functions are also recognized; member variables are also recognized, unless you use the β
--no-members
β option.operator
definitions have tag names like βoperator+
β. If you specify the β--class-qualify
β option, tags for variables and functions in classes are named βclass::variable
β and βclass::function
β. By default, class methods and members are not class-qualified, which allows to identify their names in the sources more accurately.In Java code, tags include all the constructs recognized in C++, plus the
interface
,extends
andimplements
constructs. Tags for variables and functions in classes are named βclass.variable
β and βclass.function
β.In LaTeX documents, the arguments for
\chapter
,\section
,\subsection
,\subsubsection
,\eqno
,\label
,\ref
,\cite
,\bibitem
,\part
,\appendix
,\entry
,\index
,\def
,\newcommand
,\renewcommand
,\newenvironment
and\renewenvironment
are tags.Other commands can make tags as well, if you specify them in the environment variable
TEXTAGS
before invokingetags
. The value of this environment variable should be a colon-separated list of command names. For example,TEXTAGS="mycommand:myothercommand"
export TEXTAGSspecifies (using Bourne shell syntax) that the commands β
\mycommand
β and β\myothercommand
β also define tags.In Lisp code, any function defined with
defun
, any variable defined withdefvar
ordefconst
, and in general the first argument of any expression that starts with β(def
β in column zero is a tag. As an exception, expressions of the form(defvar foo)
are treated as declarations, and are only tagged if the β--declarations
β option is given.In Scheme code, tags include anything defined with
def
or with a construct whose name starts with βdef
β. They also include variables set withset!
at top level in the file.
Several other languages are also supported:
In Ada code, functions, procedures, packages, tasks and types are tags. Use the β
--packages-only
β option to create tags for packages only.In Ada, the same name can be used for different kinds of entity (e.g., for a procedure and for a function). Also, for things like packages, procedures and functions, there is the spec (i.e., the interface) and the body (i.e., the implementation). To make it easier to pick the definition you want, Ada tag names have suffixes indicating the type of entity:
β
/b
ββpackage body.
β
/f
ββfunction.
β
/k
ββtask.
β
/p
ββprocedure.
β
/s
ββpackage spec.
β
/t
ββtype.
Thus,
M-x find-tag RET bidule/b RET
will go directly to the body of the packagebidule
, whileM-x find-tag RET bidule RET
will just search for any tagbidule
.In assembler code, labels appearing at the start of a line, followed by a colon, are tags.
In Bison or Yacc input files, each rule defines as a tag the nonterminal it constructs. The portions of the file that contain C code are parsed as C code.
In Cobol code, tags are paragraph names; that is, any word starting in column 8 and followed by a period.
In Erlang code, the tags are the functions, records and macros defined in the file.
In Fortran code, functions, subroutines and block data are tags.
In Go code, packages, functions, and types are tags.
In HTML input files, the tags are the
title
and theh1
,h2
,h3
headers. Also, tags arename=
in anchors and all occurrences ofid=
.In Lua input files, all functions are tags.
In makefiles, targets are tags; additionally, variables are tags unless you specify β
--no-globals
β.In Objective C code, tags include Objective C definitions for classes, class categories, methods and protocols. Tags for variables and functions in classes are named β
class::variable
β and βclass::function
β.In Pascal code, the tags are the functions and procedures defined in the file.
In Perl code, the tags are the packages, subroutines and variables defined by the
package
,sub
,use constant
,my
, andlocal
keywords. Use β--globals
β if you want to tag global variables. Tags for subroutines are named βpackage::sub
β. The name for subroutines defined in the default package is βmain::sub
β.In PHP code, tags are functions, classes and defines. Vars are tags too, unless you use the β
--no-members
β option.In PostScript code, the tags are the functions.
In Prolog code, tags are predicates and rules at the beginning of line.
In Python code,
def
orclass
at the beginning of a line generate a tag.In Ruby code,
def
orclass
ormodule
at the beginning of a line generate a tag. Constants also generate tags.
You can also generate tags based on regexp matching (see Etags Regexps) to handle other formats and languages.
28.4.2.2 Creating Tags Tablesβ
The etags
program is used to create a tags table file. It knows the syntax of several languages, as described in Tag Syntax. Here is how to run etags
:
etags inputfilesβ¦
The etags
program reads the specified files, and writes a tags table named TAGS
in the current working directory. You can optionally specify a different file name for the tags table by using the β--output=file
β option; specifying -
as a file name prints the tags table to standard output. You can also append the newly created tags table to an existing file by using the β--append
β option.
If the specified files donβt exist, etags
looks for compressed versions of them and uncompresses them to read them. Under MS-DOS, etags
also looks for file names like mycode.cgz
if it is given βmycode.c
β on the command line and mycode.c
does not exist.
If the tags table becomes outdated due to changes in the files described in it, you can update it by running the etags
program again. If the tags table does not record a tag, or records it for the wrong file, then Emacs will not be able to find that definition until you update the tags table. But if the position recorded in the tags table becomes a little bit wrong (due to other editing), Emacs will still be able to find the right position, with a slight delay.
Thus, there is no need to update the tags table after each edit. You should update a tags table when you define new tags that you want to have listed, or when you move tag definitions from one file to another, or when changes become substantial.
You can make a tags table include another tags table, by passing the β--include=file
β option to etags
. It then covers all the files covered by the included tags file, as well as its own.
If you specify the source files with relative file names when you run etags
, the tags file will contain file names relative to the directory where the tags file was initially written. This way, you can move an entire directory tree containing both the tags file and the source files, and the tags file will still refer correctly to the source files. If the tags file is -
or is in the /dev
directory, however, the file names are made relative to the current working directory. This is useful, for example, when writing the tags to the standard output.
When using a relative file name, it should not be a symbolic link pointing to a tags file in a different directory, because this would generally render the file names invalid.
If you specify absolute file names as arguments to etags
, then the tags file will contain absolute file names. This way, the tags file will still refer to the same files even if you move it, as long as the source files remain in the same place. Absolute file names start with β/
β, or with βdevice:/
β on MS-DOS and MS-Windows.
When you want to make a tags table from a great number of files, you may have problems listing them on the command line, because some systems have a limit on its length. You can circumvent this limit by telling etags
to read the file names from its standard input, by typing a dash in place of the file names, like this:
find . -name "*.[chCH]" -print | etags -
etags
recognizes the language used in an input file based on its file name and contents. It first tries to match the fileβs name and extension to the ones commonly used with certain languages. Some languages have interpreters with known names (e.g., perl
for Perl or pl
for Prolog), so etags
next looks for an interpreter specification of the form β#!interp
β on the first line of an input file, and matches that against known interpreters. If none of that works, or if you want to override the automatic detection of the language, you can specify the language explicitly with the β--language=name
β option. You can intermix these options with file names; each one applies to the file names that follow it. Specify β--language=auto
β to tell etags
to resume guessing the language from the file names and file contents. Specify β--language=none
β to turn off language-specific processing entirely; then etags
recognizes tags by regexp matching alone (see Etags Regexps). This comes in handy when an input file uses a language not yet supported by etags
, and you want to avoid having etags
fall back on Fortran and C as the default languages.
The option β--parse-stdin=file
β is mostly useful when calling etags
from programs. It can be used (only once) in place of a file name on the command line. etags
will read from standard input and mark the produced tags as belonging to the file file
.
βetags --help
β outputs the list of the languages etags
knows, and the file name rules for guessing the language. It also prints a list of all the available etags
options, together with a short explanation. If followed by one or more β--language=lang
β options, it outputs detailed information about how tags are generated for lang
.
28.4.2.3 Etags Regexpsβ
The β--regex
β option to etags
allows tags to be recognized by regular expression matching. You can intermix this option with file names; each one applies to the source files that follow it. If you specify multiple β--regex
β options, all of them are used in parallel. The syntax is:
--regex=[{language}]/tagregexp/[nameregexp/]modifiers
The essential part of the option value is tagregexp
, the regexp for matching tags. It is always used anchored, that is, it only matches at the beginning of a line. If you want to allow indented tags, use a regexp that matches initial whitespace; start it with β[ \t]*
β.
In these regular expressions, β\
β quotes the next character, and all the C character escape sequences are supported: β\a
β for bell, β\b
β for back space, β\e
β for escape, β\f
β for formfeed, β\n
β for newline, β\r
β for carriage return, β\t
β for tab, and β\v
β for vertical tab. In addition, β\d
β stands for the DEL
character.
Ideally, tagregexp
should not match more characters than are needed to recognize what you want to tag. If the syntax requires you to write tagregexp
so it matches more characters beyond the tag itself, you should add a nameregexp
, to pick out just the tag. This will enable Emacs to find tags more accurately and to do completion on tag names more reliably. In nameregexp
, it is frequently convenient to use βback references" (see Regexp Backslash) to parenthesized groupings β\(Β β¦Β \)
β in tagregexp
. For example, β\1
β refers to the first such parenthesized grouping. You can find some examples of this below.
The modifiers
are a sequence of zero or more characters that modify the way etags
does the matching. A regexp with no modifiers is applied sequentially to each line of the input file, in a case-sensitive way. The modifiers and their meanings are:
βi
ββ
Ignore case when matching this regexp.
βm
ββ
Match this regular expression against the whole file, so that multi-line matches are possible.
βs
ββ
Match this regular expression against the whole file, and allow β.
β in tagregexp
to match newlines.
The β-R
β option cancels all the regexps defined by preceding β--regex
β options. It too applies to the file names following it. Hereβs an example:
etags --regex=/reg1/i voo.doo --regex=/reg2/m \
bar.ber -R --lang=lisp los.er
Here etags
chooses the parsing language for voo.doo
and bar.ber
according to their contents. etags
also uses reg1
to recognize additional tags in voo.doo
, and both reg1
and reg2
to recognize additional tags in bar.ber
. reg1
is checked against each line of voo.doo
and bar.ber
, in a case-insensitive way, while reg2
is checked against the whole bar.ber
file, permitting multi-line matches, in a case-sensitive way. etags
uses only the Lisp tags rules, with no user-specified regexp matching, to recognize tags in los.er
.
You can restrict a β--regex
β option to match only files of a given language by using the optional prefix {language}
. (βetags --help
β prints the list of languages recognized by etags
.) This is particularly useful when storing many predefined regular expressions for etags
in a file. The following example tags the DEFVAR
macros in the Emacs source files, for the C language only:
--regex='{c}/[ \t]*DEFVAR_[A-Z_ \t(]+"\([^"]+\)"/\1/'
When you have complex regular expressions, you can store the list of them in a file. The following option syntax instructs etags
to read two files of regular expressions. The regular expressions contained in the second file are matched without regard to case.
--regex=@case-sensitive-file --ignore-case-regex=@ignore-case-file
A regex file for etags
contains one regular expression per line. Empty lines, and lines beginning with space or tab are ignored. When the first character in a line is β@
β, etags
assumes that the rest of the line is the name of another file of regular expressions; thus, one such file can include another file. All the other lines are taken to be regular expressions. If the first non-whitespace text on the line is β--
β, that line is a comment.
For example, we can create a file called βemacs.tags
β with the following contents:
-- This is for GNU Emacs C source files
{c}/[ \t]*DEFVAR_[A-Z_ \t(]+"\([^"]+\)"/\1/
and then use it like this:
etags --regex=@emacs.tags *.[ch] */*.[ch]
Here are some more examples. The regexps are quoted to protect them from shell interpretation.
Tag Octave files:
etags --language=none \
--regex='/[ \t]*function.*=[ \t]*\([^ \t]*\)[ \t]*(/\1/' \
--regex='/###key \(.*\)/\1/' \
--regex='/[ \t]*global[ \t].*/' \
*.mNote that tags are not generated for scripts, so that you have to add a line by yourself of the form β
###key scriptname
β if you want to jump to it.Tag Tcl files:
etags --language=none --regex='/proc[ \t]+\([^ \t]+\)/\1/' *.tcl
Tag VHDL files:
etags --language=none \
--regex='/[ \t]*\(ARCHITECTURE\|CONFIGURATION\) +[^ ]* +OF/' \
--regex='/[ \t]*\(ATTRIBUTE\|ENTITY\|FUNCTION\|PACKAGE\
\( BODY\)?\|PROCEDURE\|PROCESS\|TYPE\)[ \t]+\([^ \t(]+\)/\3/'
- A tag is a synonym for identifier reference. Commands and features based on the
etags
package traditionally use βtag" with this meaning, and this subsection follows that tradition.β©