blog/content/posts/creating-a-blog-with-bazel/03-why-bazel/index.md

+++
template = "article.html"
title = "Why Bazel?"
date = 2019-11-02T18:00:00+11:00
description = "An overview of Bazel's core concepts, from hermetic builds and reproducibility to extensibility and its three-phase build system."

[taxonomies]
tags = ["bazel"]
+++

In this post, we'll cover what [Bazel](https://bazel.build) is, how to use it,
and why I chose to use it.

## What is Bazel?

Bazel is a build-system released by Google in 2015. It actually is derived from
the internal build-system Google uses internally for most of its own code-base,
called
[Blaze](https://mike-bland.com/2012/10/01/tools.html#blaze-forge-srcfs-objfs).

### Building at scale

Bazel has a huge focus on hermetic builds, and reproducibility. Every build step
is, from a really broad perspective, defined as a list of inputs, tools, and
outputs. This allows for efficient and robust caching (if no inputs nor tools
changed, then this target doesn't need to be rebuilt, and this cascades through
the whole build graph). Let's see a sample definition of a C++ library, as well
as a C++ binary depending on it:

*BUILD*

```python
cc_library(
    name = "my_feature"
    srcs = [
        "feature_impl.cpp",
        "utils.cpp",
    ],
    hdrs = [
        "feature.hpp",
        "utils.hpp",
    ],
)

cc_binary(
    name = "my_app",
    srcs = ["main.cpp"],
    deps = [
        ":my_feature",
    ],
)
```

`cc_library` and `cc_binary` are both depending an implicit dependency on a C++
toolchain (I won't enter into any language-specific features in this post, but
if you don't tell Bazel to use a specific C++ toolchain, it will try to use your
system compiler - which is convenient, but loses a bit of hermeticity and
reproducibility). Everything else is pretty obvious here: we defined two
different build targets, one of them being a library called `my_feature`, and
the other one a binary called `my_app`, depending on `my_feature`. If we build
`my_app`, Bazel will automatically build `my_feature` first as you would expect,
and then proceed to build `my_app`. If you change the `main.cpp` and re-build
`my_app`, it will skip the compilation of `my_feature` entirely, as nothing
changed.

Bazel's cache handling is really reliable. During the past few months, I've done
a lot of diverse things (writing my own rules, compiling a bunch of different
languages, depending on third-party libraries and rules...), and never had a
single time to run `bazel clean`. Now I didn't use a lot of other build systems
in the recent past, but from someone who has been using
[Gradle](https://gradle.org/) for Android previously, this feels really weird.

### Integrating tools and other languages

Another great aspect of Bazel is its extensibility. It works with rules defined
in a language called [Starlark](https://github.com/bazelbuild/starlark), which
syntax is a subset of Python's. It comes without a lot of standard Python
features, as I/O, mutable collections, or anything that could affect build
hermeticity. While this isn't the focus of this article (I will cover the
writing of a rule to run a simple tool in a later article), here is what an
example rule can look like (from
[Bazel's samples](https://github.com/bazelbuild/examples/blob/master/rules/shell_command/rules.bzl)):

*rules.bzl*

```python
def _convert_to_uppercase_impl(ctx):
    # Both the input and output files are specified by the BUILD file.
    in_file = ctx.file.input
    out_file = ctx.outputs.output
    ctx.actions.run_shell(
        outputs = [out_file],
        inputs = [in_file],
        arguments = [in_file.path, out_file.path],
        command = "tr '[:lower:]' '[:upper:]' < \"$1\" > \"$2\"",
    )
    # No need to return anything telling Bazel to build `out_file` when
    # building this target -- It's implied because the output is declared
    # as an attribute rather than with `declare_file()`.

convert_to_uppercase = rule(
    implementation = _convert_to_uppercase_impl,
    attrs = {
        "input": attr.label(
            allow_single_file = True,
            mandatory = True,
            doc = "The file to transform",
        ),
        "output": attr.output(doc = "The generated file"),
    },
    doc = "Transforms a text file by changing its characters to uppercase.",
)
```

Once it's defined, it's re-usable to define actual build targets in a simple way:

*BUILD*

```python
load(":rules.bzl", "convert_to_uppercase")

convert_to_uppercase(
    name = "foo_but_uppercase",
    input = "foo.txt",
    output = "upper_foo.txt",
)
```

As a result of this simple extensibility, while Bazel ships only with C++ and
Java support (which are actually getting removed and rewritten in Starlark, to
decouple them from Bazel itself), a lot of rules have been written either by the
Bazel team or by the community, to integrate languages and tools. You can find
rules for [NodeJS](https://github.com/bazelbuild/rules_nodejs),
[Go](https://github.com/bazelbuild/rules_go),
[Rust](https://github.com/bazelbuild/rules_rust),
[packaging](https://github.com/bazelbuild/rules_pkg) (generating debs, zips...),
[generating Docker images](https://github.com/bazelbuild/rules_docker),
[deploying stuff on Kubernetes](https://github.com/bazelbuild/rules_k8s), and a
bunch of other things. And if there are no rules to run/build what you want, you
can write your own!

### A three-steps build

Bazel runs in
[three distinct phases](https://docs.bazel.build/versions/master/guide.html#phases).
Each of them has a specific role, and specific capabilities.

#### Loading

The loading phase is parsing and evaluating all the `BUILD` files required to
build the requested target(s). This is typically the step during witch any
third-party dependency would be fetched (just downloaded and/or extracted,
nothing more yet).

#### Analysis

The second phase is validating any involved build rule, to generate the actual
build graph. Note that both of those two first phases are entirely cached, and
if the build graph doesn't change from one build to another (e.g. you just
changed some source files), they will be skipped entirely.

#### Execution

This is the phase that checks for any out-of-date output (either non-existent,
or its inputs changed), and runs the matching actions.

### Great tooling

Bazel comes with some really cool tools. Without spending too much time on that,
here's a list of useful things:

- [ibazel](https://github.com/bazelbuild/bazel-watcher) is a filesystem-watcher
  that will rebuild a target as soon as its inputs files or dependencies
  changed.
- [query](https://docs.bazel.build/versions/master/query-how-to.html) is a
  built-in sub-command that helps to analyse the build graph. It's incredibly
  feature-packed.
- [buildozer](https://github.com/bazelbuild/buildtools/tree/master/buildozer) is
  a tool to edit `BUILD` files at across a whole repository. It can be used to
  add dependencies to specific targets, changing target visibilities, adding
  comments...
- [unused_deps](https://github.com/bazelbuild/buildtools/blob/master/unused_deps/README.md)
  is detecting unused dependencies for Java targets, and displays `buildozer`
  commands to remove them.
- Integration [with](https://github.com/bazelbuild/intellij)
  [different](https://github.com/bazelbuild/vscode-bazel)
  [IDEs](https://github.com/bazelbuild/vim-bazel).
- A set of APIs for remote caching and execution, with
  [a](https://gitlab.com/BuildGrid/buildgrid)
  [few](https://github.com/bazelbuild/bazel-buildfarm)
  [implementations](https://github.com/buildbarn), as well as an upcoming
  service on Google Cloud called Remote Build Execution, leveraging GCP to build
  remotely. The loading and analysis phases are still running locally, while the
  execution phase is running remotely.

## Choosing a build system

At the time I started thinking about working on this blog again, I had a small
private repository with a bunch of stuff, all compiled with Bazel. I also
noticed a [set of Starlark rules](https://github.com/stackb/rules_hugo)
integrating [Hugo](https://gohugo.io/). While I didn't need a build system,
Bazel seemed to be interesting for multiple aspects:

- I could leverage my existing CI system
- While Hugo comes with a bunch of features to e.g. pre-process Sass files, it
  has some kind of lock-in effect. What if I eventually realise that Hugo
  doesn't fill my need? What's the cost of migrating to a new static site
  generator? The less I rely on Hugo-specific features, the easier this would be
- I could integrate some custom asset pipelines. For example, I could have a
  diagram written with [PlantUML](http://plantuml.com/) or
  [Mermaid](https://mermaidjs.github.io/) and have it part of the Bazel graph,
  as a dependency of this blog
- Bazel would be able to handle packaging and deployment
- It sounded stupid enough to be a fun experiment? (Let's be honest, that's the
  only real reason here.)

## Closing thoughts

Bazel is quite complex, and this article only scratches the surface. The goal
was not to teach you how to use Bazel (there are a lot of existing resources for
that already), but to give a quick overview of the core ideas behind it.

If you found it interesting, here are some useful links:

- Bazel's
  [getting started](https://docs.bazel.build/versions/master/getting-started.html)
- A [list of samples](https://github.com/bazelbuild/examples) using different
  languages as well as defining some rules
- A (non-exhaustive)
  [list of rules](https://docs.bazel.build/versions/master/rules.html), as well
  as the documentation of all the built-in rules

In the next article, we'll see how to build a simple Kotlin app with Bazel, from
scratch all the way to running it.