+++ template = "article.html" title = "Why Bazel?" date = 2019-11-02T18:00:00+11:00 description = "An overview of Bazel's core concepts, from hermetic builds and reproducibility to extensibility and its three-phase build system." [taxonomies] tags = ["bazel"] +++ In this post, we'll cover what [Bazel](https://bazel.build) is, how to use it, and why I chose to use it. ## What is Bazel? Bazel is a build-system released by Google in 2015. It actually is derived from the internal build-system Google uses internally for most of its own code-base, called [Blaze](https://mike-bland.com/2012/10/01/tools.html#blaze-forge-srcfs-objfs). ### Building at scale Bazel has a huge focus on hermetic builds, and reproducibility. Every build step is, from a really broad perspective, defined as a list of inputs, tools, and outputs. This allows for efficient and robust caching (if no inputs nor tools changed, then this target doesn't need to be rebuilt, and this cascades through the whole build graph). Let's see a sample definition of a C++ library, as well as a C++ binary depending on it: *BUILD* ```python cc_library( name = "my_feature" srcs = [ "feature_impl.cpp", "utils.cpp", ], hdrs = [ "feature.hpp", "utils.hpp", ], ) cc_binary( name = "my_app", srcs = ["main.cpp"], deps = [ ":my_feature", ], ) ``` `cc_library` and `cc_binary` are both depending an implicit dependency on a C++ toolchain (I won't enter into any language-specific features in this post, but if you don't tell Bazel to use a specific C++ toolchain, it will try to use your system compiler - which is convenient, but loses a bit of hermeticity and reproducibility). Everything else is pretty obvious here: we defined two different build targets, one of them being a library called `my_feature`, and the other one a binary called `my_app`, depending on `my_feature`. If we build `my_app`, Bazel will automatically build `my_feature` first as you would expect, and then proceed to build `my_app`. If you change the `main.cpp` and re-build `my_app`, it will skip the compilation of `my_feature` entirely, as nothing changed. Bazel's cache handling is really reliable. During the past few months, I've done a lot of diverse things (writing my own rules, compiling a bunch of different languages, depending on third-party libraries and rules...), and never had a single time to run `bazel clean`. Now I didn't use a lot of other build systems in the recent past, but from someone who has been using [Gradle](https://gradle.org/) for Android previously, this feels really weird. ### Integrating tools and other languages Another great aspect of Bazel is its extensibility. It works with rules defined in a language called [Starlark](https://github.com/bazelbuild/starlark), which syntax is a subset of Python's. It comes without a lot of standard Python features, as I/O, mutable collections, or anything that could affect build hermeticity. While this isn't the focus of this article (I will cover the writing of a rule to run a simple tool in a later article), here is what an example rule can look like (from [Bazel's samples](https://github.com/bazelbuild/examples/blob/master/rules/shell_command/rules.bzl)): *rules.bzl* ```python def _convert_to_uppercase_impl(ctx): # Both the input and output files are specified by the BUILD file. in_file = ctx.file.input out_file = ctx.outputs.output ctx.actions.run_shell( outputs = [out_file], inputs = [in_file], arguments = [in_file.path, out_file.path], command = "tr '[:lower:]' '[:upper:]' < \"$1\" > \"$2\"", ) # No need to return anything telling Bazel to build `out_file` when # building this target -- It's implied because the output is declared # as an attribute rather than with `declare_file()`. convert_to_uppercase = rule( implementation = _convert_to_uppercase_impl, attrs = { "input": attr.label( allow_single_file = True, mandatory = True, doc = "The file to transform", ), "output": attr.output(doc = "The generated file"), }, doc = "Transforms a text file by changing its characters to uppercase.", ) ``` Once it's defined, it's re-usable to define actual build targets in a simple way: *BUILD* ```python load(":rules.bzl", "convert_to_uppercase") convert_to_uppercase( name = "foo_but_uppercase", input = "foo.txt", output = "upper_foo.txt", ) ``` As a result of this simple extensibility, while Bazel ships only with C++ and Java support (which are actually getting removed and rewritten in Starlark, to decouple them from Bazel itself), a lot of rules have been written either by the Bazel team or by the community, to integrate languages and tools. You can find rules for [NodeJS](https://github.com/bazelbuild/rules_nodejs), [Go](https://github.com/bazelbuild/rules_go), [Rust](https://github.com/bazelbuild/rules_rust), [packaging](https://github.com/bazelbuild/rules_pkg) (generating debs, zips...), [generating Docker images](https://github.com/bazelbuild/rules_docker), [deploying stuff on Kubernetes](https://github.com/bazelbuild/rules_k8s), and a bunch of other things. And if there are no rules to run/build what you want, you can write your own! ### A three-steps build Bazel runs in [three distinct phases](https://docs.bazel.build/versions/master/guide.html#phases). Each of them has a specific role, and specific capabilities. #### Loading The loading phase is parsing and evaluating all the `BUILD` files required to build the requested target(s). This is typically the step during witch any third-party dependency would be fetched (just downloaded and/or extracted, nothing more yet). #### Analysis The second phase is validating any involved build rule, to generate the actual build graph. Note that both of those two first phases are entirely cached, and if the build graph doesn't change from one build to another (e.g. you just changed some source files), they will be skipped entirely. #### Execution This is the phase that checks for any out-of-date output (either non-existent, or its inputs changed), and runs the matching actions. ### Great tooling Bazel comes with some really cool tools. Without spending too much time on that, here's a list of useful things: - [ibazel](https://github.com/bazelbuild/bazel-watcher) is a filesystem-watcher that will rebuild a target as soon as its inputs files or dependencies changed. - [query](https://docs.bazel.build/versions/master/query-how-to.html) is a built-in sub-command that helps to analyse the build graph. It's incredibly feature-packed. - [buildozer](https://github.com/bazelbuild/buildtools/tree/master/buildozer) is a tool to edit `BUILD` files at across a whole repository. It can be used to add dependencies to specific targets, changing target visibilities, adding comments... - [unused_deps](https://github.com/bazelbuild/buildtools/blob/master/unused_deps/README.md) is detecting unused dependencies for Java targets, and displays `buildozer` commands to remove them. - Integration [with](https://github.com/bazelbuild/intellij) [different](https://github.com/bazelbuild/vscode-bazel) [IDEs](https://github.com/bazelbuild/vim-bazel). - A set of APIs for remote caching and execution, with [a](https://gitlab.com/BuildGrid/buildgrid) [few](https://github.com/bazelbuild/bazel-buildfarm) [implementations](https://github.com/buildbarn), as well as an upcoming service on Google Cloud called Remote Build Execution, leveraging GCP to build remotely. The loading and analysis phases are still running locally, while the execution phase is running remotely. ## Choosing a build system At the time I started thinking about working on this blog again, I had a small private repository with a bunch of stuff, all compiled with Bazel. I also noticed a [set of Starlark rules](https://github.com/stackb/rules_hugo) integrating [Hugo](https://gohugo.io/). While I didn't need a build system, Bazel seemed to be interesting for multiple aspects: - I could leverage my existing CI system - While Hugo comes with a bunch of features to e.g. pre-process Sass files, it has some kind of lock-in effect. What if I eventually realise that Hugo doesn't fill my need? What's the cost of migrating to a new static site generator? The less I rely on Hugo-specific features, the easier this would be - I could integrate some custom asset pipelines. For example, I could have a diagram written with [PlantUML](http://plantuml.com/) or [Mermaid](https://mermaidjs.github.io/) and have it part of the Bazel graph, as a dependency of this blog - Bazel would be able to handle packaging and deployment - It sounded stupid enough to be a fun experiment? (Let's be honest, that's the only real reason here.) ## Closing thoughts Bazel is quite complex, and this article only scratches the surface. The goal was not to teach you how to use Bazel (there are a lot of existing resources for that already), but to give a quick overview of the core ideas behind it. If you found it interesting, here are some useful links: - Bazel's [getting started](https://docs.bazel.build/versions/master/getting-started.html) - A [list of samples](https://github.com/bazelbuild/examples) using different languages as well as defining some rules - A (non-exhaustive) [list of rules](https://docs.bazel.build/versions/master/rules.html), as well as the documentation of all the built-in rules In the next article, we'll see how to build a simple Kotlin app with Bazel, from scratch all the way to running it.