Migrate from Bazel

2025-12-12 15:30:04 +11:00 · 2025-12-12 15:30:04 +11:00 · 016dbd0814
commit 016dbd0814
59 changed files with 7044 additions and 0 deletions
--- a/content/posts/creating-a-blog-with-bazel/04-writing-bazel-rule-set/index.md
+++ b/content/posts/creating-a-blog-with-bazel/04-writing-bazel-rule-set/index.md
@ -0,0 +1,501 @@
+++
+template = "article.html"
+title = "Writing a Bazel rule set"
+date = 2020-05-16T15:55:00+11:00
+description = "Learn how to write custom Bazel rules by integrating PlantUML, including rule implementation and testing strategies."
+
+[taxonomies]
+tags = ["bazel", "plantuml"]
+++
+
+This post will cover two things:
+
+- How to run an arbitrary tool with Bazel (in this case,
+[PlantUML](https://plantuml.com/), a tool to generate diagrams), by writing a
+rule set
+- How to test this rule set.
+
+It should be mentioned that while I was working on this rule set, it became more
+and more apparent PlantUML is not a great candidate for this kind of
+integration, as its output is platform-dependent (the font rendering). Despite
+that, it's still a simple tool and as such its integration is simple, albeit not
+perfect (the rendering tests I wrote need to run on the same platform every
+time).
+
+## PlantUML usage
+
+PlantUML is a tool that takes a text input looking like this:
+
+```
+@startuml
+Alice -> Bob: SYN
+@enduml
+```
+
+And outputs an image looking like this:
+
+{% mermaid(caption="PlantUML sample output") %}
+sequenceDiagram
+    Alice->>Bob: SYN
+{% end %}
+
+PlantUML has multiple way of being invoked (CLI, GUI, as well as a _lot_ of
+integrations with different tools), but we'll go with the easiest: a one-shot
+CLI invocation. It takes as inputs:
+
+- A text file, representing a diagram
+- An optional configuration file, giving control over the output
+
+It then outputs a single image file, which can be of different formats (we'll
+just cover SVG and PNG in this article, but adding support for other formats is
+trivial).
+
+PlantUML ships as a JAR file, which needs to be run with Java. An invocation
+generating the sample image above would look like that:
+
+```bash
+java -jar plantuml.jar -tpng -p < 'mysource.puml' > 'dir/myoutput.png'
+```
+
+Pretty straightforward: run the JAR, with a single option for the image type,
+pipe the content of the input file and get the output file back. The `-p` flag
+is the short form of `-pipe`, which we're using as using pipes is the only way
+of properly controlling the output path (without that, PlantUML tries to be
+smart and places the output next to the input).
+
+With a configuration file:
+
+```bash
+java -jar plantuml.jar -tpng -config config.puml -p < 'mysource.puml' > 'dir/myoutput.png'
+```
+
+Simple enough, right? Well, not really. PlantUML actually integrates some
+metadata in the files it generates. For example, when generating an SVG:
+
+```svg
+<!-- The actual SVG image has been omitted, as this part is deterministic and
+pretty long. -->
+
+<svg><g>
+<!--MD5=[8d4298e8c40046c92682b92efe1f786e]
+@startuml
+Alice -> Bob: SYN
+@enduml
+
+PlantUML version 1.2020.07(Sun Apr 19 21:42:40 AEST 2020)
+(GPL source distribution)
+Java Runtime: OpenJDK Runtime Environment
+JVM: OpenJDK 64-Bit Server VM
+Java Version: 11.0.6+10
+Operating System: Linux
+Default Encoding: UTF-8
+Language: en
+Country: AU
+--></g></svg>
+```
+
+This makes PlantUML non hermetic by default (in addition to the fonts issue
+mentioned earlier). While PlantUML has a  simple way of working around that (in
+the form of a `-nometadata` flag), this is something to keep in mind when
+integrating a tool with Bazel: is this tool usable in a hermetic way? If not,
+how to minimise the impact of this non-hermeticity?
+
+From there, here is the invocation we'll work with:
+
+```bash
+java -jar plantuml.jar -tpng -nometadata -config config.puml \
+  -p < 'mysource.puml' > 'dir/myoutput.png'
+```
+
+## Getting PlantUML
+
+PlantUML is a Java application, available as a JAR on Maven. As such, it can be
+fetched with the help of
+[rules_jvm_external](https://github.com/bazelbuild/rules_jvm_external/), as was
+explained in
+[a previous article](@/posts/creating-a-blog-with-bazel/02-compiling-a-kotlin-application-with-bazel/index.md#dependencies).
+The Maven rules will expose the JAR as a library, but we need a binary to be
+able to run it. In e.g. `//third_party/plantuml/BUILD`:
+
+```python
+load("@rules_java//java:defs.bzl", "java_binary")
+
+java_binary(
+    name = "plantuml",
+    main_class = "net.sourceforge.plantuml.Run",
+    visibility = ["//visibility:public"],
+    runtime_deps = [
+        "@maven//:net_sourceforge_plantuml_plantuml",
+    ],
+)
+```
+
+From there, we can use `//third_party/plantuml` as any Bazel binary target - we
+can run it with `bazel run`, and we can pass it as a tool for rule actions.
+
+This is a pattern that works well for any JVM-based tool. Other kinds of tools
+will need a different preparation step to make them available through Bazel -
+but as long as you can get a binary, you should be good.
+
+## Rule set structure
+
+This rule set will follow the same structure we previously used for
+[Ktlint](@/posts/creating-a-blog-with-bazel/02-compiling-a-kotlin-application-with-bazel/index.md#ktlint):
+
+- Based in `//tools/plantuml`
+- A public interface exposed in `//tools/plantuml/defs.bzl`
+- Internal actions definition in `//tools/plantuml/internal/actions.bzl`
+- Internal rule definition in `//tools/plantuml/internal/rules.bzl`
+
+But in addition:
+
+- Tests for the actions in `//tools/plantuml/internal/actions_test.bzl`
+- Integration tests in `//tools/plantuml/tests`
+
+Let's start by defining our actions.
+
+## Actions
+
+### Implementation
+
+We need only one action for our rule: one that takes a source file, an optional
+configuration file, the PlantUML binary, and emits the output file by calling
+PlantUML. Let's assume for a moment we have a helper function which, given the
+proper input, returns the PlantUML command line to call, called
+`plantuml_command_line`, and write the action from there:
+
+```python
+def plantuml_generate(ctx, src, format, config, out):
+    """Generates a single PlantUML graph from a puml file.
+
+    Args:
+        ctx: analysis context.
+        src: source file to be read.
+        format: the output image format.
+        config: the configuration file. Optional.
+        out: output image file.
+    """
+    command = plantuml_command_line(
+        executable = ctx.executable._plantuml_tool.path,
+        config = config.path if config else None,
+        src = src.path,
+        output = out.path,
+        output_format = format,
+    )
+
+    inputs = [src]
+
+    if config:
+        inputs.append(config)
+
+    ctx.actions.run_shell(
+        outputs = [out],
+        inputs = inputs,
+        tools = [ctx.executable._plantuml_tool],
+        command = command,
+        mnemonic = "PlantUML",
+        progress_message = "Generating %s" % out.basename,
+    )
+```
+
+This is pretty straightforward: we generate the command line, passing either the
+attributes' respective paths (or `None` for the configuration file if it's not
+provided, since it's optional), as well as the requested image format. We define
+that both our source file and configuration files are inputs, and PlantUML is a
+requested tool.
+
+Now let's implement our helper function. It's there again really
+straightforward: it gets a bunch of paths as input, and needs to generate a
+command line call (in the form of a simple string) from them:
+
+```python
+def plantuml_command_line(executable, config, src, output, output_format):
+    """Formats the command line to call PlantUML with the given arguments.
+
+    Args:
+        executable: path to the PlantUML binary.
+        config: path to the configuration file. Optional.
+        src: path to the source file.
+        output: path to the output file.
+        output_format: image format of the output file.
+
+    Returns:
+        A command to invoke PlantUML
+    """
+
+    command = "%s -nometadata -p -t%s " % (
+        shell.quote(executable),
+        output_format,
+    )
+
+    if config:
+        command += " -config %s " % shell.quote(config)
+
+    command += " < %s > %s" % (
+        shell.quote(src),
+        shell.quote(output),
+    )
+
+    return command
+```
+
+An interesting note is that because PlantUML is already integrated as an
+executable Bazel target, we don't care that it's a JAR, a C++ binary or a shell
+script: Bazel knows exactly what this executable is made of, how to prepare
+(e.g. compile) it if necessary, its runtime dependencies (in this case, a JRE)
+and, more importantly in this context, how to run it. We can treat our tool
+target as a single executable file, and run it as such just from its path.
+Bazel will automatically make sure to provide us with everything we need. (For
+more details: the target actually points to a shell script generated by Bazel,
+through the Java rules, which in the case of a `java_binary` target is
+responsible for defining the classpath, among other things. The JAR file is
+merely a dependency of this shell script, and as such is provided as a runtime
+dependency.)
+
+Writing this as a helper function rather than directly in the action definition
+serves two purposes: not only does it make the whole thing slightly easier to
+read, but this function, which contains the logic (even though in this case it's
+really simple), is easily testable: it takes only strings as arguments, and
+returns a string. It's also a pure function: it doesn't have any side effect,
+and as such it will always return the same output given the same set of inputs.
+
+### Tests
+
+To test Starlark functions like this one, Bazel's
+[Skylib](https://github.com/bazelbuild/bazel-skylib) provides a test framework
+which, while requiring a bit of boilerplate, is pretty simple to use. In this
+specific case, we only have two different cases to test: with and without
+configuration file provided. Error cases should be unreachable due to the way
+the rule will be defined: Bazel will be responsible for enforcing the presence
+of an executable target for PlantUML's binary, a valid image format... Let's see
+how that works. In `//tools/plantuml/internal/actions_test.bzl`:
+
+```python
+"""Unit tests for PlantUML action"""
+
+load("@bazel_skylib//lib:unittest.bzl", "asserts", "unittest")
+load(":actions.bzl", "plantuml_command_line")
+
+def _no_config_impl(ctx):
+    env = unittest.begin(ctx)
+    asserts.equals(
+        env,
+        "'/bin/plantuml' -nometadata -p -tpng  < 'mysource.puml' > 'dir/myoutput.png'",
+        plantuml_command_line(
+            executable = "/bin/plantuml",
+            config = None,
+            src = "mysource.puml",
+            output = "dir/myoutput.png",
+            output_format = "png",
+        ),
+    )
+    return unittest.end(env)
+
+no_config_test = unittest.make(_no_config_impl)
+
+def _with_config_impl(ctx):
+    env = unittest.begin(ctx)
+    asserts.equals(
+        env,
+        "'/bin/plantuml' -nometadata -p -tpng  -config 'myskin.skin'  < 'mysource.puml' > 'dir/myoutput.png'",
+        plantuml_command_line(
+            executable = "/bin/plantuml",
+            config = "myskin.skin",
+            src = "mysource.puml",
+            output = "dir/myoutput.png",
+            output_format = "png",
+        ),
+    )
+    return unittest.end(env)
+
+with_config_test = unittest.make(_with_config_impl)
+
+def actions_test_suite():
+    unittest.suite(
+        "actions_tests",
+        no_config_test,
+        with_config_test,
+    )
+```
+
+First, we define two functions, which are the actual test logic:
+`_no_config_impl` and `_with_config_impl`. Their content is pretty simple: we
+start a unit test environment, we invoke our test function and assert that the
+result is indeed what we expected, and we close the unit test environment. The
+return value is needed by the test framework, as it's what carries what
+assertions passed or failed.
+
+Next, we declare those two functions as actual unit tests, wrapping them with a
+call to `unittest.make`. We can then add those two test targets to a test suite,
+which is what actually generates a test target when invoked. Which means that
+this macro needs to be invoked, in the `BUILD` file:
+
+```python
+load(":actions_test.bzl", "actions_test_suite")
+
+actions_test_suite()
+```
+
+We can run our tests, and hopefully everything should pass:
+
+```bash
+$ bazel test //tools/plantuml/internal:actions_tests
+INFO: Invocation ID: 112bd049-7398-4b23-b62b-1398e9731eb7
+INFO: Analyzed 2 targets (5 packages loaded, 927 targets configured).
+INFO: Found 2 test targets...
+INFO: Elapsed time: 0.238s, Critical Path: 0.00s
+INFO: 0 processes.
+//tools/plantuml/internal:actions_tests_test_0                           PASSED in 0.4s
+//tools/plantuml/internal:actions_tests_test_1                           PASSED in 0.3s
+
+Executed 0 out of 2 tests: 2 tests pass.
+INFO: Build completed successfully, 1 total action
+```
+
+## Rules definition
+
+Similarly as the actions definition, we only have one rule to define here. Let's
+call it `plantuml_graph()`. It needs our usual set of inputs, and outputs a
+single file, which name will be `${target_name}.{image_format}`. It's also where
+we define the set of acceptable image formats, the fact that the input file is
+mandatory but the configuration file optional, and the actual executable target
+to use for PlantUML. The only thing we actually do is, as expected, calling our
+`plantuml_generate` action defined above.
+
+```python
+load(
+    ":actions.bzl",
+    "plantuml_generate",
+)
+
+def _plantuml_graph_impl(ctx):
+    output = ctx.actions.declare_file("{name}.{format}".format(
+        name = ctx.label.name,
+        format = ctx.attr.format,
+    ))
+    plantuml_generate(
+        ctx,
+        src = ctx.file.src,
+        format = ctx.attr.format,
+        config = ctx.file.config,
+        out = output,
+    )
+
+    return [DefaultInfo(
+        files = depset([output]),
+    )]
+
+plantuml_graph = rule(
+    _plantuml_graph_impl,
+    attrs = {
+        "config": attr.label(
+            doc = "Configuration file to pass to PlantUML. Useful to tweak the skin",
+            allow_single_file = True,
+        ),
+        "format": attr.string(
+            doc = "Output image format",
+            default = "png",
+            values = ["png", "svg"],
+        ),
+        "src": attr.label(
+            allow_single_file = [".puml"],
+            doc = "Source file to generate the graph from",
+            mandatory = True,
+        ),
+        "_plantuml_tool": attr.label(
+            default = "//third_party/plantuml",
+            executable = True,
+            cfg = "host",
+        ),
+    },
+    outputs = {
+        "graph": "%{name}.%{format}",
+    },
+    doc = "Generates a PlantUML graph from a puml file",
+)
+```
+
+## Public interface
+
+As we only have a single rule, and nothing else specific to do, the public
+interface is dead simple:
+
+```python
+load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")
+
+plantuml_graph = _plantuml_graph
+```
+
+You might then be wondering: why is this useful, and why shouldn't I just import
+the rule definition from `//tools/plantuml/internal:rules.bzl` directly? Having
+this kind of public interface allows you to tweak the actual rule definition
+without breaking any consumer site, as long as you respect the public interface.
+You can also add features to every consumer site in a really simple way. Let's
+imagine for example that you have a `view_image` rule which, given an image
+file, generates a script to view it, you could then transform your public
+interface like this:
+
+```python
+load("//tools/plantuml/internal:rules.bzl", _plantuml_graph = "plantuml_graph")
+load("//tools/utils:defs.bzl", _view_image = "view_image")
+
+def plantuml_graph(name, src, config, format):
+    _plantuml_graph(
+        name = name,
+        src = src,
+        config = config,
+        format = format,
+    )
+
+    _view_image(
+        name = "%s.view" % name,
+        src = ":%s.%s" % (name, format),
+    )
+```
+
+And suddenly, all your PlantUML graphs have an implicit `.view` target defined
+automatically, allowing you to see the output directly without having to dig in
+Bazel's output directories.
+
+A set of Bazel rules for LaTeX actually provides such a feature to view the PDF
+output: they have a
+[`view_pdf.sh` script](https://github.com/ProdriveTechnologies/bazel-latex/blob/master/view_pdf.sh),
+used by  their main
+[`latex_document` macro](https://github.com/ProdriveTechnologies/bazel-latex/blob/master/latex.bzl#L45).
+
+## Further testing
+
+For a rule this simple, I took just a simple further step: having a few
+reference PlantUML graphs, as well as their expected rendered output, which I
+compare through Phosphorus, a really simple tool I wrote to help compare two
+images, covered in the previous article (I told you it would be useful!). But
+for more complex cases, Skylib offer more utilities like an
+[analysis test](https://github.com/bazelbuild/bazel-skylib/blob/master/docs/analysis_test_doc.md),
+and a
+[build test](https://github.com/bazelbuild/bazel-skylib/blob/master/docs/build_test_doc.md).
+
+## Closing thoughts
+
+While writing this kind of tools might look like a lot of works, it's actually
+pretty mechanical for a lot of cases. I worked on a few others like
+[markdownlint](https://github.com/igorshubovych/markdownlint-cli), which now
+runs on all my Markdown files as regular Bazel test targets, or
+[pngcrush](https://pmt.sourceforge.io/pngcrush/), which is ran on the PNG files
+hosted on this blog. In a monorepo, writing such a rule is the kind of task that
+you do once, and it just keeps on giving - you can easily compose different
+rules with a main use-case, with a bunch of test targets generated for virtually
+free.
+
+On another note, I'm aware that having all this in a public repository would
+make things much simpler to follow. Sadly, it's part of a larger mono-repository
+which makes open-sourcing only the relevant parts tricky. Dumping a snapshot
+somewhere would be an option, but I'd rather have an actual living repository.
+
+Now that we have all the tools we need (that was kind of convoluted, I'll give
+you that), there are only two steps left to cover:
+
+- Generating the actual blog (ironically enough, this will be a really quick
+step, despite being the only really important one)
+- Managing the deployment.
+
+We're getting there!