Migrate from Bazel

This commit is contained in:
Marc Plano-Lesay 2025-12-12 15:30:04 +11:00
commit 016dbd0814
Signed by: kernald
GPG key ID: 66A41B08CC62A6CF
59 changed files with 7044 additions and 0 deletions

View file

@ -0,0 +1,36 @@
+++
template = "article.html"
title = "A new beginning"
date = 2019-10-31T21:05:00+11:00
description = "Resurrecting an inactive blog by migrating from Octopress to Hugo, and embarking on a journey to build everything with Bazel."
[taxonomies]
tags = ["bazel"]
+++
This blog has been inactive for a long time. I tried to at least post an article
yearly, and next thing you know, two years and a half fly by... Halloween seemed
like a good time to resurrect it.
I wanted to start writing again recently, and faced an issue: this blog was
using Octopress 2. Well, Octopress has [apparently been dead for even longer
than this blog](http://octopress.org/). So I wanted to switch to another static
generator. I found [Hugo](https://gohugo.io/), which is actively maintained and
ticked all the boxes I had, so that's what I settled for (sorry for the probable
RSS feed mess - while I set up 301 redirects for the old articles, I guess this
won't play nicely with any RSS reader. This is actually what prompted this
article...)
This could have been an hour worth of work - migrating the content (both Hugo
and Octopress are using Markdown, so that part was really simple), finding or
putting together a nice template, and call it a day. But how fun is that?
Instead, I chose to go with the most complex (hence fun, right?) approach
possible. And that was by using [Bazel](https://bazel.build/) to do
_everything_. Sass linting and pre-processing, HTML generation, generating a
Docker image, deploying it... and with tests for a lot of things along the way.
Today, the deployment part is still missing (I'm working on it), but everything
else is pretty much ready.
I plan to describe this whole journey soon, although I don't know exactly which
form it will take yet - probably a series of small articles covering a specific
aspect. In the meantime, welcome back on a brand-new blog!

View file

@ -0,0 +1,639 @@
+++
template = "article.html"
title = "Compiling a Kotlin application with Bazel"
date = 2019-12-08T11:30:00+11:00
description = "A comprehensive guide to building Kotlin applications with Bazel, including dependency management, testing, and static analysis with Detekt and Ktlint."
[taxonomies]
tags = ["bazel", "kotlin"]
+++
This post will describe how to compile a small application written in Kotlin
using [Bazel](https://bazel.build), tests, as well as how to use static
analyzers.
## Phosphorus
Phosphorus is the application that this post will cover. It's a small utility
that I wrote to check if an image matches a reference. If it doesn't, Phosphorus
generates an image highlighting the differences. The goal is to be able to check
that something generates an image in a given way, and doesn't change - at least
if it's not expected. The actual usage will be covered later in this series.
While it's not open-source yet, it's something I intend to do at some point.
It's written in Kotlin, as a couple external dependencies (
[Clikt](https://ajalt.github.io/clikt/) and [Dagger](https://dagger.dev/)), as
well as a few tests. This is the structure:
{% mermaid(caption="Phosphorus's class diagram") %}
classDiagram
namespace loader {
class ImageLoader {
<<interface>>
}
class ImageIoLoader {
}
}
namespace differ {
class ImageDiffer {
<<interface>>
}
class ImageDifferImpl {
}
}
namespace data {
class Image
class DiffResult
}
class Phosphorus
ImageIoLoader ..|> ImageLoader
ImageDifferImpl ..|> ImageDiffer
Phosphorus --> ImageLoader
Phosphorus --> ImageDiffer
{% end %}
The `differ` module contains the core logic - comparing two images, and
generating a `DiffResult`. This `DiffResult` contains both the straightforward
result of the comparison (are the two images identical?) and an image
highlighting the differences, if any. The `loader` package is responsible for
loading and writing images. Finally, the `Phosphorus` class orchestrates all
that, in addition to processing command line arguments with Clikt.
## Dependencies
Phosphorus has two dependencies: Clikt, and Dagger. Both of them are available
as Maven artifacts. In order to pull Maven artifacts, the Bazel team provides a
set of rules called
[rules_jvm_external](https://github.com/bazelbuild/rules_jvm_external/). The
idea is the following: you list a bunch of Maven coordinates and repositories,
the rule will fetch all of them (and their transitive dependencies) during the
loading phase, and generate Bazel targets corresponding to those Maven
artifacts, on which you can depend. Let's see how we can use them. The first
step is to load the rules, in the `WORKSPACE`:
```python
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "rules_jvm_external",
sha256 = "62133c125bf4109dfd9d2af64830208356ce4ef8b165a6ef15bbff7460b35c3a",
strip_prefix = "rules_jvm_external-3.0",
url = "https://github.com/bazelbuild/rules_jvm_external/archive/3.0.zip",
)
```
Then, we can load and call `maven_install` with the list of Maven coordinates we
want, in the `WORKSPACE` too:
```python
load("@rules_jvm_external//:defs.bzl", "maven_install")
maven_install(
artifacts = [
"com.github.ajalt:clikt:2.2.0",
"com.google.dagger:dagger:2.25.2",
"com.google.dagger:dagger-compiler:2.25.2",
"com.google.truth:truth:1.0",
"javax.inject:javax.inject:1",
"junit:junit:4.12",
],
fetch_sources = True,
repositories = [
"https://maven.google.com",
"https://repo1.maven.org/maven2",
"https://jcenter.bintray.com/",
],
strict_visibility = True,
)
```
A couple of things to note:
- We're also downloading [JUnit](https://junit.org/junit4/) and
[Truth](https://truth.dev/), that we're going to use in tests
- `maven_install` can try to download the sources, if they're available on
Maven, to be able to see them directly from the IDE
At this point, Clikt, JUnit and Truth are ready to be used. They are exposed
respectively as `@maven//:com_github_ajalt_clikt`, `@maven//:junit_junit` and
`@maven//:com_google_truth_truth`.
Dagger, on the other hand, comes with an annotation processor and, as such,
needs some more work: it needs to be exposed as a Java Plugin. Because it's a
third party dependency, this will be defined in `//third_party/dagger/BUILD`:
```python
java_plugin(
name = "dagger_plugin",
processor_class = "dagger.internal.codegen.ComponentProcessor",
deps = [
"@maven//:com_google_dagger_dagger_compiler",
],
)
java_library(
name = "dagger",
exported_plugins = [":dagger_plugin"],
visibility = ["//visibility:public"],
exports = [
"@maven//:com_google_dagger_dagger",
"@maven//:com_google_dagger_dagger_compiler",
"@maven//:javax_inject_javax_inject",
],
)
```
It can now be used as `//third_party/dagger`.
## Compilation
Bazel doesn't support Kotlin out of the box (the few languages natively
supported, Java and C++, are currently getting extracted from Bazel's core, so
all languages will soon share a similar integration). In order to compile some
Kotlin code, we'll have to use some Starlark rules describing how to use
`kotlinc`. A set of rules is available
[here](https://github.com/bazelbuild/rules_kotlin/). While they don't support
Kotlin/Native, they do support targeting both the JVM (including Android) and
JavaScript.
In order to use those rules, we need to declare them in the `WORKSPACE`:
```python
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_kotlin",
sha256 = "54678552125753d9fc0a37736d140f1d2e69778d3e52cf454df41a913b964ede",
strip_prefix = "rules_kotlin-legacy-1.3.0-rc3",
url = "https://github.com/bazelbuild/rules_kotlin/archive/legacy-1.3.0-rc3.zip",
)
load("@io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kotlin_repositories", "kt_register_toolchains")
kotlin_repositories()
kt_register_toolchains()
```
Once that's done, we have access to a few rules:
- `kt_js_library`
- `kt_js_import`
- `kt_jvm_binary`
- `kt_jvm_import`
- `kt_jvm_library`
- `kt_jvm_test`
- `kt_android_library`
We're going to use `kt_jvm_binary`, `kt_jvm_library` as well as `kt_jvm_test`.
As JVM-based languages have a strong correlation between packages and folder
structure, we need to be careful about where we store our source code. Bazel
handles a few names as potential Java "roots": `java`, `javatests` and `src`.
Anything inside a directory named like this needs to follow the package/folder
correlation. For example, a class
`fr.enoent.phosphorus.client.matcher.Phosphorus` can be stored at those
locations:
- `//java/fr/enoent/phosphorus/Phosphorus.kt`
- `//tools/images/java/fr/enoent/phosphorus/Phosphorus.kt`
- `//java/tools/images/src/fr/enoent/phosphorus/Phosphorus.kt`
In my repo, everything Java-related is stored under `//java`, and the
corresponding tests are in `//javatests` (following the same structure).
Phosphorus will hence be in `//java/fr/enoent/phosphorus`.
Let's see how we can define a simple Kotlin library, with the `data` module. In
`//java/fr/enoent/phosphorus/data/BUILD`:
```python
load("@io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kt_jvm_library")
kt_jvm_library(
name = "data",
srcs = [
"DiffResult.kt",
"Image.kt",
],
visibility = [
"//java/fr/enoent/phosphorus:__subpackages__",
"//javatests/fr/enoent/phosphorus:__subpackages__",
],
)
```
And that's it, we have our first library ready to be compiled! I won't describe
all the modules as it's pretty repetitive and there's not a lot of value into
doing that, but let's see what the main binary looks like. Defined in
`//java/fr/enoent/phosphorus/BUILD`, we have:
```python
load("@io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kt_jvm_binary")
kt_jvm_binary(
name = "phosphorus",
srcs = [
"Phosphorus.kt",
],
main_class = "fr.enoent.phosphorus.PhosphorusKt",
visibility = ["//visibility:public"],
deps = [
"//java/fr/enoent/phosphorus/differ",
"//java/fr/enoent/phosphorus/differ/impl:module",
"//java/fr/enoent/phosphorus/loader",
"//java/fr/enoent/phosphorus/loader/io_impl:module",
"//third_party/dagger",
"@maven//:com_github_ajalt_clikt",
],
)
```
Note the name of the `main_class`: because it's a Kotlin class, the compiler
will append `Kt` at the end of its name. Once this is defined, we can run
Phosphorus with this command:
```
bazel run //java/fr/enoent/phosphorus -- arguments passed to Phosphorus directly
```
## Tests
As mentioned previously, the test root will be `//javatests`. Because we need to
follow the packages structure, the tests themselves will be under
`//javatests/fr/enoent/phosphorus`. They are regular JUnit 4 tests, using Truth
for the assertions.
Defining unit tests is really straightforward, and follows really closely the
pattern we saw with libraries and binaries. For example, the `ImageTest` test is
defined like this, in `//javatests/fr/enoent/phosphorus/data/BUILD`:
```python
load("@io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kt_jvm_test")
kt_jvm_test(
name = "ImageTest",
srcs = ["ImageTest.kt"],
deps = [
"//java/fr/enoent/phosphorus/data",
"@maven//:com_google_truth_truth",
"@maven//:junit_junit",
],
)
```
This will define a Bazel target that we can invoke like this:
```
bazel test //javatests/fr/enoent/phosphorus/data:ImageTest
```
Hopefully, the output should look like this:
```
//javatests/fr/enoent/phosphorus/data:ImageTest PASSED in 0.3s
```
Once this is done, it's possible to run
`ibazel test //javatests/fr/enoent/phosphorus/...` - Bazel will then monitor all
the test targets defined under that path, as well as their dependencies, and
re-run all the affected tests as soon as something is edited. Because Bazel
encourages small build targets, has some great caching, and the Kotlin compiler
uses a persistent worker, the feedback loop is really quick.
## Static analysis
For Kotlin, two tools are quite useful:
[Detekt](https://arturbosch.github.io/detekt/), and
[Ktlint](https://ktlint.github.io/). The idea to run them will be really
similar: having two supporting test targets for each actual Kotlin target,
running Detekt and Ktlint on its sources. In order to do that easily, we'll
define some wrappers around the `kt_jvm_*` set of rules. Those wrappers will be
responsible for generating the two supporting test targets, as well as calling
the original `kt_jvm_*` rule. The resulting macro will be entirely transparent
to use, the only difference being the `load` call.
Let's see what those macros could look like. In `//java/rules/defs.bzl`:
```python
load(
"@io_bazel_rules_kotlin//kotlin:kotlin.bzl",
upstream_kt_jvm_binary = "kt_jvm_binary",
upstream_kt_jvm_library = "kt_jvm_library",
upstream_kt_jvm_test = "kt_jvm_test",
)
def kt_jvm_binary(name, srcs, **kwargs):
upstream_kt_jvm_binary(
name = name,
srcs = srcs,
**kwargs
)
_common_tests(name = name, srcs = srcs)
def kt_jvm_library(name, srcs, **kwargs):
upstream_kt_jvm_library(
name = name,
srcs = srcs,
**kwargs
)
_common_tests(name = name, srcs = srcs)
def kt_jvm_test(name, srcs, size = "small", **kwargs):
upstream_kt_jvm_test(
name = name,
srcs = srcs,
size = size,
**kwargs
)
_common_tests(name = name, srcs = srcs)
def _common_tests(name, srcs):
# This will come soon, no-op for now
```
With those wrappers defined, we need to actually call them. Because we're
following the same signature and name as the upstream rules, we just need to
update our `load` calls in the different `BUILD` files.
`load("@io_bazel_rules_kotlin//kotlin:kotlin.bzl", "kt_jvm_test")` will become
`load("//java/rules:defs.bzl", "kt_jvm_test")`, and so on. `_common_tests` will
be responsible for calling Detekt and Ktlint, let's see how.
### Detekt
[Artem Zinnatullin](https://twitter.com/artem_zin) published a
[set of rules](https://github.com/buildfoundation/bazel_rules_detekt/) to run
Detekt a week before I started writing this, making things way easier. As usual,
let's start by loading this in the `WORKSPACE`:
```python
http_file(
name = "detekt_cli_jar",
sha256 = "e9710fb9260c0824b3a9ae7d8326294ab7a01af68cfa510cab66de964da80862",
urls = ["https://jcenter.bintray.com/io/gitlab/arturbosch/detekt/detekt-cli/1.2.0/detekt-cli-1.2.0-all.jar"],
)
http_archive(
name = "rules_detekt",
sha256 = "f1632c2492291f5144a5e0f5e360a094005e20987518d228709516cc935ad1a1",
strip_prefix = "bazel_rules_detekt-0.2.0",
url = "https://github.com/buildfoundation/bazel_rules_detekt/archive/v0.2.0.zip",
)
```
This exposes a rule named `detekt`, which defines a build target, generating the
Detekt report. While there are a few options, we'll keep things simple. This is
what a basic invocation looks like, in any `BUILD` file:
```python
detekt(
name = "detekt_report",
srcs = glob(["**/*.kt"]),
)
```
We can integrate that in our `_common_tests` macro, to generate a Detekt target
automatically for every Kotlin target:
```python
def _common_tests(name, srcs):
detekt(
name = "%s_detekt_report" % name,
srcs = srcs,
config = "//java/rules/internal:detekt-config.yml",
)
```
All our Kotlin targets now have a `$name_detekt_report` target generated
automatically, using a common Detekt configuration.
The way this `detekt` rule work is by creating a build target, that generates
the report. Which means that it's not actually a test - which is what we were
trying to achieve. In order to do this, we can use
[Bazel Skylib](https://github.com/bazelbuild/bazel-skylib)'s `build_test`. This
test rule generates a test target that just has a dependency on other targets -
if any of those dependencies fails to build, then the test fails. Otherwise, it
passes. Our macro becomes:
```python
def _common_tests(name, srcs):
detekt(
name = "%s_detekt_report" % name,
srcs = srcs,
config = "//java/rules/internal:detekt-config.yml",
)
build_test(
name = "%s_detekt_test" % name,
targets = [":%s_detekt_report" % name],
)
```
And there we have it - a `$name_detekt_test` that is actually a test, and will
fail if Detekt raises errors.
### Ktlint
Ktlint doesn't have any existing open-source rules. Let's see how we can write
our own minimal one. It will take as inputs the list of files to check, as well
as an optional [editorconfig](https://editorconfig.org/) configuration, that
Ktlint supports natively.
The definition of the rules will be split in three files: two internal files
defining respectively the _action_ (how to invoke Ktlint) and the _rule
interface_ (what's its name, its arguments...), as well as a third, public file,
meant to be consumed by users.
Let's start by downloading Ktlint itself. In the `WORKSPACE`, as usual:
```python
http_file(
name = "com_github_pinterest_ktlint",
executable = True,
sha256 = "a656342cfce5c1fa14f13353b84b1505581af246638eb970c919fb053e695d5e",
urls = ["https://github.com/pinterest/ktlint/releases/download/0.36.0/ktlint"],
)
```
Let's move onto the action definition. It's a simple macro returning a string,
which defines how to invoke Ktlint, given some arguments. In
`//tools/ktlint/internal/actions.bzl`:
```python
def ktlint(ctx, srcs, editorconfig):
"""Generates a test action linting the input files.
Args:
ctx: analysis context.
srcs: list of source files to be checked.
editorconfig: editorconfig file to use (optional)
Returns:
A script running ktlint on the input files.
"""
args = []
if editorconfig:
args.append("--editorconfig={file}".format(file = editorconfig.short_path))
for f in srcs:
args.append(f.path)
return "{linter} {args}".format(
linter = ctx.executable._ktlint_tool.short_path,
args = " ".join(args),
)
```
Pretty straightforward - we combine both Ktlint's executable path, the
editorconfig file if it's provided, and the list of source files.
Now for the rule interface, we will define a rule named `ktlint_test`. Building
a `ktlint_test` target will mean generating a shell script to invoke Ktlint with
the given set of argument, and running it will invoke that script - hence
running Ktlint as well. In `//tools/ktlint/internal/rules.bzl`:
```python
load(":actions.bzl", "ktlint")
def _ktlint_test_impl(ctx):
script = ktlint(
ctx,
srcs = ctx.files.srcs,
editorconfig = ctx.file.editorconfig,
)
ctx.actions.write(
output = ctx.outputs.executable,
content = script,
)
files = [ctx.executable._ktlint_tool] + ctx.files.srcs
if ctx.file.editorconfig:
files.append(ctx.file.editorconfig)
return [
DefaultInfo(
runfiles = ctx.runfiles(
files = files,
).merge(ctx.attr._ktlint_tool[DefaultInfo].default_runfiles),
executable = ctx.outputs.executable,
),
]
ktlint_test = rule(
_ktlint_test_impl,
attrs = {
"srcs": attr.label_list(
allow_files = [".kt", ".kts"],
doc = "Source files to lint",
mandatory = True,
allow_empty = False,
),
"editorconfig": attr.label(
doc = "Editor config file to use",
mandatory = False,
allow_single_file = True,
),
"_ktlint_tool": attr.label(
default = "@com_github_pinterest_ktlint//file",
executable = True,
cfg = "target",
),
},
doc = "Lint Kotlin files, and fail if the linter raises errors.",
test = True,
)
```
We have two different parts here - the definition of the interface, with the
call to `rule`, and the implementation of that rule, defined as
`_ktlint_test_impl`.
The call to `rule` define how this rule can be invoked. We define that it
requires a list of `.kt` and/or `.kts` files named `srcs`, an optional file
named `editorconfig`, as well as a hidden argument named `_ktlint_tool`, which
is just a helper for us to reference the Ktlint binary - to which we pass the
file we defined in the `WORKSPACE` earlier.
The actual implementation is working in multiple steps:
1. It invokes the `ktlint` action we defined earlier, to generate the script
that will be invoked.
2. It generates an action to write that script, in a file referred as
`ctx.outputs.executable` (which Bazel knows how to handle and what to do with
it, we don't need to worry about where it is or anything, it won't be in the
source tree anyway).
3. It computes a list of files that are needed to run this target. This is what
allows Bazel to ensure hermeticity - it will know that this rule needs to be
re-run if any of those files are changed. If the target runs in a sandboxed
environment (which is the default on most platforms, as far as I'm aware), only
those files will be available.
4. It returns a `Provider`, responsible for holding a description of what this
target needs.
Finally, we write a file that only exposes the bits users should care about.
It's not mandatory, but makes a clear delimitation between what is an
implementation detail and what users can actually rely on. In
`//tools/ktlint/defs.bzl`:
```python
load(
"//tools/ktlint/internal:rules.bzl",
_ktlint_test = "ktlint_test",
)
ktlint_test = _ktlint_test
```
We just expose the rule we wrote in `rules.bzl` as `ktlint_test`.
Once this is done, we can use this `ktlint_test` rule where we needed it, in our
`_common_tests` macro for Kotlin targets:
```python
def _common_tests(name, srcs):
ktlint_test(
name = "%s_ktlint_test" % name,
srcs = srcs,
editorconfig = "//:.editorconfig",
)
detekt(
name = "%s_detekt_report" % name,
srcs = srcs,
config = "//java/rules/internal:detekt-config.yml",
)
build_test(
name = "%s_detekt_test" % name,
targets = [":%s_detekt_report" % name],
)
```
And there we have it - all our Kotlin targets have both Detekt and Ktlint test
targets. Because we're exposing those as Bazel targets, we automatically benefit
from its caching and remote execution capabilities - those linters won't re-run
if the inputs didn't change, and can run remotely, with Bazel being aware of
which files are needed on the remote machine.
## Closing thoughts
But what's the link between generating a blog with Bazel and compiling a Kotlin
application? Well, almost none, but there is one. The class diagram included
earlier in this article is generated with a tool called
[PlantUML](http://plantuml.com/), which generates images from a text
representation of a graph. The next article in this series will talk about
integrating this tool into Bazel (in a similar way as we did with Ktlint), but
also how to test the Bazel rule. And to have some integration tests, Phosphorus
will come in handy!

View file

@ -0,0 +1,234 @@
+++
template = "article.html"
title = "Why Bazel?"
date = 2019-11-02T18:00:00+11:00
description = "An overview of Bazel's core concepts, from hermetic builds and reproducibility to extensibility and its three-phase build system."
[taxonomies]
tags = ["bazel"]
+++
In this post, we'll cover what [Bazel](https://bazel.build) is, how to use it,
and why I chose to use it.
## What is Bazel?
Bazel is a build-system released by Google in 2015. It actually is derived from
the internal build-system Google uses internally for most of its own code-base,
called
[Blaze](https://mike-bland.com/2012/10/01/tools.html#blaze-forge-srcfs-objfs).
### Building at scale
Bazel has a huge focus on hermetic builds, and reproducibility. Every build step
is, from a really broad perspective, defined as a list of inputs, tools, and
outputs. This allows for efficient and robust caching (if no inputs nor tools
changed, then this target doesn't need to be rebuilt, and this cascades through
the whole build graph). Let's see a sample definition of a C++ library, as well
as a C++ binary depending on it:
*BUILD*
```python
cc_library(
name = "my_feature"
srcs = [
"feature_impl.cpp",
"utils.cpp",
],
hdrs = [
"feature.hpp",
"utils.hpp",
],
)
cc_binary(
name = "my_app",
srcs = ["main.cpp"],
deps = [
":my_feature",
],
)
```
`cc_library` and `cc_binary` are both depending an implicit dependency on a C++
toolchain (I won't enter into any language-specific features in this post, but
if you don't tell Bazel to use a specific C++ toolchain, it will try to use your
system compiler - which is convenient, but loses a bit of hermeticity and
reproducibility). Everything else is pretty obvious here: we defined two
different build targets, one of them being a library called `my_feature`, and
the other one a binary called `my_app`, depending on `my_feature`. If we build
`my_app`, Bazel will automatically build `my_feature` first as you would expect,
and then proceed to build `my_app`. If you change the `main.cpp` and re-build
`my_app`, it will skip the compilation of `my_feature` entirely, as nothing
changed.
Bazel's cache handling is really reliable. During the past few months, I've done
a lot of diverse things (writing my own rules, compiling a bunch of different
languages, depending on third-party libraries and rules...), and never had a
single time to run `bazel clean`. Now I didn't use a lot of other build systems
in the recent past, but from someone who has been using
[Gradle](https://gradle.org/) for Android previously, this feels really weird.
### Integrating tools and other languages
Another great aspect of Bazel is its extensibility. It works with rules defined
in a language called [Starlark](https://github.com/bazelbuild/starlark), which
syntax is a subset of Python's. It comes without a lot of standard Python
features, as I/O, mutable collections, or anything that could affect build
hermeticity. While this isn't the focus of this article (I will cover the
writing of a rule to run a simple tool in a later article), here is what an
example rule can look like (from
[Bazel's samples](https://github.com/bazelbuild/examples/blob/master/rules/shell_command/rules.bzl)):
*rules.bzl*
```python
def _convert_to_uppercase_impl(ctx):
# Both the input and output files are specified by the BUILD file.
in_file = ctx.file.input
out_file = ctx.outputs.output
ctx.actions.run_shell(
outputs = [out_file],
inputs = [in_file],
arguments = [in_file.path, out_file.path],
command = "tr '[:lower:]' '[:upper:]' < \"$1\" > \"$2\"",
)
# No need to return anything telling Bazel to build `out_file` when
# building this target -- It's implied because the output is declared
# as an attribute rather than with `declare_file()`.
convert_to_uppercase = rule(
implementation = _convert_to_uppercase_impl,
attrs = {
"input": attr.label(
allow_single_file = True,
mandatory = True,
doc = "The file to transform",
),
"output": attr.output(doc = "The generated file"),
},
doc = "Transforms a text file by changing its characters to uppercase.",
)
```
Once it's defined, it's re-usable to define actual build targets in a simple way:
*BUILD*
```python
load(":rules.bzl", "convert_to_uppercase")
convert_to_uppercase(
name = "foo_but_uppercase",
input = "foo.txt",
output = "upper_foo.txt",
)
```
As a result of this simple extensibility, while Bazel ships only with C++ and
Java support (which are actually getting removed and rewritten in Starlark, to
decouple them from Bazel itself), a lot of rules have been written either by the
Bazel team or by the community, to integrate languages and tools. You can find
rules for [NodeJS](https://github.com/bazelbuild/rules_nodejs),
[Go](https://github.com/bazelbuild/rules_go),
[Rust](https://github.com/bazelbuild/rules_rust),
[packaging](https://github.com/bazelbuild/rules_pkg) (generating debs, zips...),
[generating Docker images](https://github.com/bazelbuild/rules_docker),
[deploying stuff on Kubernetes](https://github.com/bazelbuild/rules_k8s), and a
bunch of other things. And if there are no rules to run/build what you want, you
can write your own!
### A three-steps build
Bazel runs in
[three distinct phases](https://docs.bazel.build/versions/master/guide.html#phases).
Each of them has a specific role, and specific capabilities.
#### Loading
The loading phase is parsing and evaluating all the `BUILD` files required to
build the requested target(s). This is typically the step during witch any
third-party dependency would be fetched (just downloaded and/or extracted,
nothing more yet).
#### Analysis
The second phase is validating any involved build rule, to generate the actual
build graph. Note that both of those two first phases are entirely cached, and
if the build graph doesn't change from one build to another (e.g. you just
changed some source files), they will be skipped entirely.
#### Execution
This is the phase that checks for any out-of-date output (either non-existent,
or its inputs changed), and runs the matching actions.
### Great tooling
Bazel comes with some really cool tools. Without spending too much time on that,
here's a list of useful things:
- [ibazel](https://github.com/bazelbuild/bazel-watcher) is a filesystem-watcher
that will rebuild a target as soon as its inputs files or dependencies
changed.
- [query](https://docs.bazel.build/versions/master/query-how-to.html) is a
built-in sub-command that helps to analyse the build graph. It's incredibly
feature-packed.
- [buildozer](https://github.com/bazelbuild/buildtools/tree/master/buildozer) is
a tool to edit `BUILD` files at across a whole repository. It can be used to
add dependencies to specific targets, changing target visibilities, adding
comments...
- [unused_deps](https://github.com/bazelbuild/buildtools/blob/master/unused_deps/README.md)
is detecting unused dependencies for Java targets, and displays `buildozer`
commands to remove them.
- Integration [with](https://github.com/bazelbuild/intellij)
[different](https://github.com/bazelbuild/vscode-bazel)
[IDEs](https://github.com/bazelbuild/vim-bazel).
- A set of APIs for remote caching and execution, with
[a](https://gitlab.com/BuildGrid/buildgrid)
[few](https://github.com/bazelbuild/bazel-buildfarm)
[implementations](https://github.com/buildbarn), as well as an upcoming
service on Google Cloud called Remote Build Execution, leveraging GCP to build
remotely. The loading and analysis phases are still running locally, while the
execution phase is running remotely.
## Choosing a build system
At the time I started thinking about working on this blog again, I had a small
private repository with a bunch of stuff, all compiled with Bazel. I also
noticed a [set of Starlark rules](https://github.com/stackb/rules_hugo)
integrating [Hugo](https://gohugo.io/). While I didn't need a build system,
Bazel seemed to be interesting for multiple aspects:
- I could leverage my existing CI system
- While Hugo comes with a bunch of features to e.g. pre-process Sass files, it
has some kind of lock-in effect. What if I eventually realise that Hugo
doesn't fill my need? What's the cost of migrating to a new static site
generator? The less I rely on Hugo-specific features, the easier this would be
- I could integrate some custom asset pipelines. For example, I could have a
diagram written with [PlantUML](http://plantuml.com/) or
[Mermaid](https://mermaidjs.github.io/) and have it part of the Bazel graph,
as a dependency of this blog
- Bazel would be able to handle packaging and deployment
- It sounded stupid enough to be a fun experiment? (Let's be honest, that's the
only real reason here.)
## Closing thoughts
Bazel is quite complex, and this article only scratches the surface. The goal
was not to teach you how to use Bazel (there are a lot of existing resources for
that already), but to give a quick overview of the core ideas behind it.
If you found it interesting, here are some useful links:
- Bazel's
[getting started](https://docs.bazel.build/versions/master/getting-started.html)
- A [list of samples](https://github.com/bazelbuild/examples) using different
languages as well as defining some rules
- A (non-exhaustive)
[list of rules](https://docs.bazel.build/versions/master/rules.html), as well
as the documentation of all the built-in rules
In the next article, we'll see how to build a simple Kotlin app with Bazel, from
scratch all the way to running it.

View file

@ -0,0 +1,501 @@
+++
template = "article.html"
title = "Writing a Bazel rule set"
date = 2020-05-16T15:55:00+11:00
description = "Learn how to write custom Bazel rules by integrating PlantUML, including rule implementation and testing strategies."
[taxonomies]
tags = ["bazel", "plantuml"]
+++
This post will cover two things:
- How to run an arbitrary tool with Bazel (in this case,
[PlantUML](https://plantuml.com/), a tool to generate diagrams), by writing a
rule set
- How to test this rule set.
It should be mentioned that while I was working on this rule set, it became more
and more apparent PlantUML is not a great candidate for this kind of
integration, as its output is platform-dependent (the font rendering). Despite
that, it's still a simple tool and as such its integration is simple, albeit not
perfect (the rendering tests I wrote need to run on the same platform every
time).
## PlantUML usage
PlantUML is a tool that takes a text input looking like this:
```
@startuml
Alice -> Bob: SYN
@enduml
```
And outputs an image looking like this:
{% mermaid(caption="PlantUML sample output") %}
sequenceDiagram
Alice->>Bob: SYN
{% end %}
PlantUML has multiple way of being invoked (CLI, GUI, as well as a _lot_ of
integrations with different tools), but we'll go with the easiest: a one-shot
CLI invocation. It takes as inputs:
- A text file, representing a diagram
- An optional configuration file, giving control over the output
It then outputs a single image file, which can be of different formats (we'll
just cover SVG and PNG in this article, but adding support for other formats is
trivial).
PlantUML ships as a JAR file, which needs to be run with Java. An invocation
generating the sample image above would look like that:
```bash
java -jar plantuml.jar -tpng -p < 'mysource.puml' > 'dir/myoutput.png'
```
Pretty straightforward: run the JAR, with a single option for the image type,
pipe the content of the input file and get the output file back. The `-p` flag
is the short form of `-pipe`, which we're using as using pipes is the only way
of properly controlling the output path (without that, PlantUML tries to be
smart and places the output next to the input).
With a configuration file:
```bash
java -jar plantuml.jar -tpng -config config.puml -p < 'mysource.puml' > 'dir/myoutput.png'
```
Simple enough, right? Well, not really. PlantUML actually integrates some
metadata in the files it generates. For example, when generating an SVG:
```svg
<!-- The actual SVG image has been omitted, as this part is deterministic and
pretty long. -->
<svg><g>
<!--MD5=[8d4298e8c40046c92682b92efe1f786e]
@startuml
Alice -> Bob: SYN
@enduml
PlantUML version 1.2020.07(Sun Apr 19 21:42:40 AEST 2020)
(GPL source distribution)
Java Runtime: OpenJDK Runtime Environment
JVM: OpenJDK 64-Bit Server VM
Java Version: 11.0.6+10
Operating System: Linux
Default Encoding: UTF-8
Language: en
Country: AU
--></g></svg>
```
This makes PlantUML non hermetic by default (in addition to the fonts issue
mentioned earlier). While PlantUML has a simple way of working around that (in
the form of a `-nometadata` flag), this is something to keep in mind when
integrating a tool with Bazel: is this tool usable in a hermetic way? If not,
how to minimise the impact of this non-hermeticity?
From there, here is the invocation we'll work with:
```bash
java -jar plantuml.jar -tpng -nometadata -config config.puml \
-p < 'mysource.puml' > 'dir/myoutput.png'
```
## Getting PlantUML
PlantUML is a Java application, available as a JAR on Maven. As such, it can be
fetched with the help of
[rules_jvm_external](https://github.com/bazelbuild/rules_jvm_external/), as was
explained in
[a previous article](@/posts/creating-a-blog-with-bazel/02-compiling-a-kotlin-application-with-bazel/index.md#dependencies).
The Maven rules will expose the JAR as a library, but we need a binary to be
able to run it. In e.g. `//third_party/plantuml/BUILD`:
```python
load("@rules_java//java:defs.bzl", "java_binary")
java_binary(
name = "plantuml",
main_class = "net.sourceforge.plantuml.Run",
visibility = ["//visibility:public"],
runtime_deps = [
"@maven//:net_sourceforge_plantuml_plantuml",
],
)
```
From there, we can use `//third_party/plantuml` as any Bazel binary target - we
can run it with `bazel run`, and we can pass it as a tool for rule actions.
This is a pattern that works well for any JVM-based tool. Other kinds of tools
will need a different preparation step to make them available through Bazel -
but as long as you can get a binary, you should be good.
## Rule set structure
This rule set will follow the same structure we previously used for
[Ktlint](@/posts/creating-a-blog-with-bazel/02-compiling-a-kotlin-application-with-bazel/index.md#ktlint):
- Based in `//tools/plantuml`
- A public interface exposed in `//tools/plantuml/defs.bzl`
- Internal actions definition in `//tools/plantuml/internal/actions.bzl`
- Internal rule definition in `//tools/plantuml/internal/rules.bzl`
But in addition:
- Tests for the actions in `//tools/plantuml/internal/actions_test.bzl`
- Integration tests in `//tools/plantuml/tests`
Let's start by defining our actions.
## Actions
### Implementation
We need only one action for our rule: one that takes a source file, an optional
configuration file, the PlantUML binary, and emits the output file by calling
PlantUML. Let's assume for a moment we have a helper function which, given the
proper input, returns the PlantUML command line to call, called
`plantuml_command_line`, and write the action from there:
```python
def plantuml_generate(ctx, src, format, config, out):
"""Generates a single PlantUML graph from a puml file.
Args:
ctx: analysis context.
src: source file to be read.
format: the output image format.
config: the configuration file. Optional.
out: output image file.
"""
command = plantuml_command_line(
executable = ctx.executable._plantuml_tool.path,
config = config.path if config else None,
src = src.path,
output = out.path,
output_format = format,
)
inputs = [src]
if config:
inputs.append(config)
ctx.actions.run_shell(
outputs = [out],
inputs = inputs,
tools = [ctx.executable._plantuml_tool],
command = command,
mnemonic = "PlantUML",
progress_message = "Generating %s" % out.basename,
)
```
This is pretty straightforward: we generate the command line, passing either the
attributes' respective paths (or `None` for the configuration file if it's not
provided, since it's optional), as well as the requested image format. We define
that both our source file and configuration files are inputs, and PlantUML is a
requested tool.
Now let's implement our helper function. It's there again really
straightforward: it gets a bunch of paths as input, and needs to generate a
command line call (in the form of a simple string) from them:
```python
def plantuml_command_line(executable, config, src, output, output_format):
"""Formats the command line to call PlantUML with the given arguments.
Args:
executable: path to the PlantUML binary.
config: path to the configuration file. Optional.
src: path to the source file.
output: path to the output file.
output_format: image format of the output file.
Returns:
A command to invoke PlantUML
"""
command = "%s -nometadata -p -t%s " % (
shell.quote(executable),
output_format,
)
if config:
command += " -config %s " % shell.quote(config)
command += " < %s > %s" % (
shell.quote(src),
shell.quote(output),
)
return command
```
An interesting note is that because PlantUML is already integrated as an
executable Bazel target, we don't care that it's a JAR, a C++ binary or a shell
script: Bazel knows exactly what this executable is made of, how to prepare
(e.g. compile) it if necessary, its runtime dependencies (in this case, a JRE)
and, more importantly in this context, how to run it. We can treat our tool
target as a single executable file, and run it as such just from its path.
Bazel will automatically make sure to provide us with everything we need. (For
more details: the target actually points to a shell script generated by Bazel,
through the Java rules, which in the case of a `java_binary` target is
responsible for defining the classpath, among other things. The JAR file is
merely a dependency of this shell script, and as such is provided as a runtime
dependency.)
Writing this as a helper function rather than directly in the action definition
serves two purposes: not only does it make the whole thing slightly easier to
read, but this function, which contains the logic (even though in this case it's
really simple), is easily testable: it takes only strings as arguments, and
returns a string. It's also a pure function: it doesn't have any side effect,
and as such it will always return the same output given the same set of inputs.
### Tests
To test Starlark functions like this one, Bazel's
[Skylib](https://github.com/bazelbuild/bazel-skylib) provides a test framework
which, while requiring a bit of boilerplate, is pretty simple to use. In this
specific case, we only have two different cases to test: with and without
configuration file provided. Error cases should be unreachable due to the way
the rule will be defined: Bazel will be responsible for enforcing the presence
of an executable target for PlantUML's binary, a valid image format... Let's see
how that works. In `//tools/plantuml/internal/actions_test.bzl`:
```python
"""Unit tests for PlantUML action"""
load("@bazel_skylib//lib:unittest.bzl", "asserts", "unittest")
load(":actions.bzl", "plantuml_command_line")
def _no_config_impl(ctx):
env = unittest.begin(ctx)
asserts.equals(
env,
"'/bin/plantuml' -nometadata -p -tpng < 'mysource.puml' > 'dir/myoutput.png'",
plantuml_command_line(
executable = "/bin/plantuml",
config = None,
src = "mysource.puml",
output = "dir/myoutput.png",
output_format = "png",
),
)
return unittest.end(env)
no_config_test = unittest.make(_no_config_impl)
def _with_config_impl(ctx):
env = unittest.begin(ctx)
asserts.equals(
env,
"'/bin/plantuml' -nometadata -p -tpng -config 'myskin.skin' < 'mysource.puml' > 'dir/myoutput.png'",
plantuml_command_line(
executable = "/bin/plantuml",
config = "myskin.skin",
src = "mysource.puml",
output = "dir/myoutput.png",
output_format = "png",
),
)
return unittest.end(env)
with_config_test = unittest.make(_with_config_impl)
def actions_test_suite():
unittest.suite(
"actions_tests",
no_config_test,
with_config_test,
)
```
First, we define two functions, which are the actual test logic:
`_no_config_impl` and `_with_config_impl`. Their content is pretty simple: we
start a unit test environment, we invoke our test function and assert that the
result is indeed what we expected, and we close the unit test environment. The
return value is needed by the test framework, as it's what carries what
assertions passed or failed.
Next, we declare those two functions as actual unit tests, wrapping them with a
call to `unittest.make`. We can then add those two test targets to a test suite,
which is what actually generates a test target when invoked. Which means that
this macro needs to be invoked, in the `BUILD` file:
```python
load(":actions_test.bzl", "actions_test_suite")
actions_test_suite()
```
We can run our tests, and hopefully everything should pass:
```bash
$ bazel test //tools/plantuml/internal:actions_tests
INFO: Invocation ID: 112bd049-7398-4b23-b62b-1398e9731eb7
INFO: Analyzed 2 targets (5 packages loaded, 927 targets configured).
INFO: Found 2 test targets...
INFO: Elapsed time: 0.238s, Critical Path: 0.00s
INFO: 0 processes.
//tools/plantuml/internal:actions_tests_test_0 PASSED in 0.4s
//tools/plantuml/internal:actions_tests_test_1 PASSED in 0.3s
Executed 0 out of 2 tests: 2 tests pass.
INFO: Build completed successfully, 1 total action
```
## Rules definition
Similarly as the actions definition, we only have one rule to define here. Let's
call it `plantuml_graph()`. It needs our usual set of inputs, and outputs a
single file, which name will be `${target_name}.{image_format}`. It's also where
we define the set of acceptable image formats, the fact that the input file is
mandatory but the configuration file optional, and the actual executable target
to use for PlantUML. The only thing we actually do is, as expected, calling our
`plantuml_generate` action defined above.
```python
load(
":actions.bzl",
"plantuml_generate",
)
def _plantuml_graph_impl(ctx):
output = ctx.actions.declare_file("{name}.{format}".format(
name = ctx.label.name,
format = ctx.attr.format,
))
plantuml_generate(
ctx,
src = ctx.file.src,
format = ctx.attr.format,
config = ctx.file.config,
out = output,
)
return [DefaultInfo(
files = depset([output]),
)]
plantuml_graph = rule(
_plantuml_graph_impl,
attrs = {
"config": attr.label(
doc = "Configuration file to pass to PlantUML. Useful to tweak the skin",
allow_single_file = True,
),
"format": attr.string(
doc = "Output image format",
default = "png",
values = ["png", "svg"],
),
"src": attr.label(
allow_single_file = [".puml"],
doc = "Source file to generate the graph from",
mandatory = True,
),
"_plantuml_tool": attr.label(
default = "//third_party/plantuml",
executable = True,
cfg = "host",
),
},
outputs = {
"graph": "%{name}.%{format}",
},
doc = "Generates a PlantUML graph from a puml file",
)
```
## Public interface
As we only have a single rule, and nothing else specific to do, the public
interface is dead simple:
```python
load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")
plantuml_graph = _plantuml_graph
```
You might then be wondering: why is this useful, and why shouldn't I just import
the rule definition from `//tools/plantuml/internal:rules.bzl` directly? Having
this kind of public interface allows you to tweak the actual rule definition
without breaking any consumer site, as long as you respect the public interface.
You can also add features to every consumer site in a really simple way. Let's
imagine for example that you have a `view_image` rule which, given an image
file, generates a script to view it, you could then transform your public
interface like this:
```python
load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")
load("//tools/utils:defs.bzl", _view_image = "view_image")
def plantuml_graph(name, src, config, format):
_plantuml_graph(
name = name,
src = src,
config = config,
format = format,
)
_view_image(
name = "%s.view" % name,
src = ":%s.%s" % (name, format),
)
```
And suddenly, all your PlantUML graphs have an implicit `.view` target defined
automatically, allowing you to see the output directly without having to dig in
Bazel's output directories.
A set of Bazel rules for LaTeX actually provides such a feature to view the PDF
output: they have a
[`view_pdf.sh` script](https://github.com/ProdriveTechnologies/bazel-latex/blob/master/view_pdf.sh),
used by their main
[`latex_document` macro](https://github.com/ProdriveTechnologies/bazel-latex/blob/master/latex.bzl#L45).
## Further testing
For a rule this simple, I took just a simple further step: having a few
reference PlantUML graphs, as well as their expected rendered output, which I
compare through Phosphorus, a really simple tool I wrote to help compare two
images, covered in the previous article (I told you it would be useful!). But
for more complex cases, Skylib offer more utilities like an
[analysis test](https://github.com/bazelbuild/bazel-skylib/blob/master/docs/analysis_test_doc.md),
and a
[build test](https://github.com/bazelbuild/bazel-skylib/blob/master/docs/build_test_doc.md).
## Closing thoughts
While writing this kind of tools might look like a lot of works, it's actually
pretty mechanical for a lot of cases. I worked on a few others like
[markdownlint](https://github.com/igorshubovych/markdownlint-cli), which now
runs on all my Markdown files as regular Bazel test targets, or
[pngcrush](https://pmt.sourceforge.io/pngcrush/), which is ran on the PNG files
hosted on this blog. In a monorepo, writing such a rule is the kind of task that
you do once, and it just keeps on giving - you can easily compose different
rules with a main use-case, with a bunch of test targets generated for virtually
free.
On another note, I'm aware that having all this in a public repository would
make things much simpler to follow. Sadly, it's part of a larger mono-repository
which makes open-sourcing only the relevant parts tricky. Dumping a snapshot
somewhere would be an option, but I'd rather have an actual living repository.
Now that we have all the tools we need (that was kind of convoluted, I'll give
you that), there are only two steps left to cover:
- Generating the actual blog (ironically enough, this will be a really quick
step, despite being the only really important one)
- Managing the deployment.
We're getting there!

View file

@ -0,0 +1,11 @@
+++
title = "Creating a blog with Bazel"
template = "series.html"
sort_by = "slug"
transparent = true
[extra]
series = true
+++
This series explores building and deploying a blog using Bazel, covering everything from basic Kotlin compilation to writing custom Bazel rules.