A target A
depends upon a target B
if B
is needed by A
at build or
execution time. The depends upon relation induces a
Directed Acyclic Graph
(DAG) over targets, and it is called a dependency graph.
A target's direct dependencies are those other targets reachable by a path of length 1 in the dependency graph. A target's transitive dependencies are those targets upon which it depends via a path of any length through the graph.
In fact, in the context of builds, there are two dependency graphs, the graph of actual dependencies and the graph of declared dependencies. Most of the time, the two graphs are so similar that this distinction need not be made, but it is useful for the discussion below.
Actual and declared dependencies
A target X
is actually dependent on target Y
if Y
must be present,
built, and up-to-date in order for X
to be built correctly. Built could
mean generated, processed, compiled, linked, archived, compressed, executed, or
any of the other kinds of tasks that routinely occur during a build.
A target X
has a declared dependency on target Y
if there is a dependency
edge from X
to Y
in the package of X
.
For correct builds, the graph of actual dependencies A must be a subgraph of
the graph of declared dependencies D. That is, every pair of
directly-connected nodes x --> y
in A must also be directly connected in
D. It can be said that D is an overapproximation of A.
BUILD
file writers must explicitly declare all of the actual direct
dependencies for every rule to the build system, and no more.
Failure to observe this principle causes undefined behavior: the build may fail, but worse, the build may depend on some prior operations, or upon transitive declared dependencies the target happens to have. Bazel checks for missing dependencies and report errors, but it's not possible for this checking to be complete in all cases.
You need not (and should not) attempt to list everything indirectly imported,
even if it is needed by A
at execution time.
During a build of target X
, the build tool inspects the entire transitive
closure of dependencies of X
to ensure that any changes in those targets are
reflected in the final result, rebuilding intermediates as needed.
The transitive nature of dependencies leads to a common mistake. Sometimes,
code in one file may use code provided by an indirect dependency — a
transitive but not direct edge in the declared dependency graph. Indirect
dependencies don't appear in the BUILD
file. Because the rule doesn't
directly depend on the provider, there is no way to track changes, as shown in
the following example timeline:
1. Declared dependencies match actual dependencies
At first, everything works. The code in package a
uses code in package b
.
The code in package b
uses code in package c
, and thus a
transitively
depends on c
.
a/BUILD |
b/BUILD |
---|---|
rule( name = "a", srcs = "a.in", deps = "//b:b", ) |
rule( name = "b", srcs = "b.in", deps = "//c:c", ) |
a / a.in |
b / b.in |
import b; b.foo(); |
import c; function foo() { c.bar(); } |
The declared dependencies overapproximate the actual dependencies. All is well.
2. Adding an undeclared dependency
A latent hazard is introduced when someone adds code to a
that creates a
direct actual dependency on c
, but forgets to declare it in the build file
a/BUILD
.
a / a.in |
|
---|---|
import b; import c; b.foo(); c.garply(); |
|
The declared dependencies no longer overapproximate the actual dependencies.
This may build ok, because the transitive closures of the two graphs are equal,
but masks a problem: a
has an actual but undeclared dependency on c
.
3. Divergence between declared and actual dependency graphs
The hazard is revealed when someone refactors b
so that it no longer depends on
c
, inadvertently breaking a
through no
fault of their own.
b/BUILD |
|
---|---|
rule( name = "b", srcs = "b.in", deps = "//d:d", ) |
|
b / b.in |
|
import d; function foo() { d.baz(); } |
|
The declared dependency graph is now an underapproximation of the actual dependencies, even when transitively closed; the build is likely to fail.
The problem could have been averted by ensuring that the actual dependency from
a
to c
introduced in Step 2 was properly declared in the BUILD
file.
Types of dependencies
Most build rules have three attributes for specifying different kinds of
generic dependencies: srcs
, deps
and data
. These are explained below. For
more details, see
Attributes common to all rules.
Many rules also have additional attributes for rule-specific kinds of
dependencies, for example, compiler
or resources
. These are detailed in the
Build Encyclopedia.
srcs
dependencies
Files consumed directly by the rule or rules that output source files.
deps
dependencies
Rule pointing to separately-compiled modules providing header files, symbols, libraries, data, etc.
data
dependencies
A build target might need some data files to run correctly. These data files aren't source code: they don't affect how the target is built. For example, a unit test might compare a function's output to the contents of a file. When you build the unit test you don't need the file, but you do need it when you run the test. The same applies to tools that are launched during execution.
The build system runs tests in an isolated directory where only files listed as
data
are available. Thus, if a binary/library/test needs some files to run,
specify them (or a build rule containing them) in data
. For example:
# I need a config file from a directory named env:
java_binary(
name = "setenv",
...
data = [":env/default_env.txt"],
)
# I need test data from another directory
sh_test(
name = "regtest",
srcs = ["regtest.sh"],
data = [
"//data:file1.txt",
"//data:file2.txt",
...
],
)
These files are available using the relative path path/to/data/file
. In tests,
you can refer to these files by joining the paths of the test's source
directory and the workspace-relative path, for example,
${TEST_SRCDIR}/workspace/path/to/data/file
.
Using labels to reference directories
As you look over our BUILD
files, you might notice that some data
labels
refer to directories. These labels end with /.
or /
like these examples,
which you should not use:
Not recommended —
data = ["//data/regression:unittest/."]
Not recommended —
data = ["testdata/."]
Not recommended —
data = ["testdata/"]
This seems convenient, particularly for tests because it allows a test to use all the data files in the directory.
But try not to do this. In order to ensure correct incremental rebuilds (and
re-execution of tests) after a change, the build system must be aware of the
complete set of files that are inputs to the build (or test). When you specify
a directory, the build system performs a rebuild only when the directory itself
changes (due to addition or deletion of files), but won't be able to detect
edits to individual files as those changes don't affect the enclosing directory.
Rather than specifying directories as inputs to the build system, you should
enumerate the set of files contained within them, either explicitly or using the
glob()
function. (Use **
to force the
glob()
to be recursive.)
Recommended —
data = glob(["testdata/**"])
Unfortunately, there are some scenarios where directory labels must be used.
For example, if the testdata
directory contains files whose names don't
conform to the label syntax,
then explicit enumeration of files, or use of the
glob()
function produces an invalid labels
error. You must use directory labels in this case, but beware of the
associated risk of incorrect rebuilds described above.
If you must use directory labels, keep in mind that you can't refer to the
parent package with a relative ../
path; instead, use an absolute path like
//data/regression:unittest/.
.
Any external rule, such as a test, that needs to use multiple files must
explicitly declare its dependence on all of them. You can use filegroup()
to
group files together in the BUILD
file:
filegroup(
name = 'my_data',
srcs = glob(['my_unittest_data/*'])
)
You can then reference the label my_data
as the data dependency in your test.
BUILD files | Visibility |