Simplify Your CI Pipeline Configuration with Jsonnet
Disclaimer: This content reflects my personal opinions, not those of any organizations I am or have been affiliated with. Code samples are provided for illustration purposes only, use with caution and test thoroughly before deployment.
This post is also featured on the DAZN Engineering Blog.
Most of the CI/CD (Continuous Integration/Continuous Delivery) tools nowadays supports some form of configuration file so you can properly version control them. For example Travis CI, Gitlab CI, Circle CI and Drone CI uses YAML file. Jenkins uses its own DSL. These YAML-based configuration files are easy to read and edit, but they don’t scale very well when the file grows big. This problem can be solved by using a nice data templating language called Jsonnet. In this post we’ll be demonstrating Drone CI v1.0 configuration file format, but the idea can be easily applied to other CI tool.
The problem with YAML-based CI configuration files
The first problem is that pipelines become hard to reason about when you have more and more conditional builds. Usually when we are using git with CI pipelines, we end up with multiple pipelines for each scenario. For example, imagine we have an imaginary Node.js web service, when I do a feature branch push (i.e. non-master branch push) we would want to trigger the build and unit test steps. When the pull request is approved and we merge the branch to the master
branch, we want it to build, unit test, deploy to our dev
environment and then run integration test on it; Once we are done with testing in dev
environment, we can use Drone CI’s CLI to trigger a deployment to stage
, which will take the build from the previous master branch build and deploy it to the ‘stage’ environment, then run integration test on it. The same can be applied for deployment to production, but we’ll leave it out to keep the example simple. So to summarize, we ant the following pipelines:
When push to non-master
- build
- unit_test
When merge to master
- build
- unit_test
- deploy_dev
- integration_test
When manually deploy to stage
- deploy_stage
- integration_test
Drone and many other CI solution allow you to achieve this with some conditions. A Drone config file for the above pipelines will look like this:
kind: pipeline
name: default
steps:
- name: build
image: node:8.6.0
commands:
- npm install
- npm run build
when:
event:
- push
- name: unit_test
image: node:8.6.0
commands:
- npm run unit_test
when:
event:
- push
- name: intergration_test
image: node:8.6.0
commands:
- npm run integration_test
- name: deploy_dev
image: node:8.6.0
commands:
- npm run deploy -- --env=dev
when:
event:
- push
branch:
- master
- name: deploy_stage
image: node:8.6.0
commands:
- npm run deploy -- --env=stage
when:
event:
- promote
environment:
- stage
The event: promote
in the deploy_stage
is triggered by a CLI call drone promote <repo/name> <build> <environment>
. This is how manual deployment is triggered in Drone. Don’t worry if you don’t understand how this works, it’s not critical to our discussion.
Now imaging you are new to the project and read this drone pipeline, what would happen when you push a feature branch? First you’ll have to read through all the steps. For each step you’ll need to check if the when
conditions matches the scenario you care about. Then you need to write down all the steps that matched. Be careful that a step without any when
condition will run in every situation. So you’ll need to do a lot of processing in your head to see what will be run when. It’s also very easy to add a new step with the wrong condition and have it run in an unexpected situation. The pipeline configuration we just created is basically a tree, and we apply conditions onto it to get a branch of it.
But it will be much simpler if we duplicate the build and test steps and enumerate every combination with when
blocks. But this way we’ll end up with 8 steps, each with different when
condition, while most of the code is duplicated. We’ll solve this with jsonnet after we explain the second problem.
The second problem is code duplication. YAML provides anchor to cut down on repetition. But that only works at key-value granularity. A simple YAML anchor looks like this:
anchors:
- &anchor_job
job: programmer
duty: code and debug
employees:
- name: Alice
<<: *anchor_job
- name: Bob
<<: *anchor_job
In this example, we defined an anchor called &anchor_job
, which contains two keys, job: programmer
and duty: code and debug
. In our employees
list, we use <<: *anchor_job
to in-line it into the name: Alice
object. The keys from &anchor_job
will be merged into the name: Alice
object and become
employees:
- name: Alice
job: programmer
duty: code and debug
- name: Bob
job: programmer
duty: code and debug
However this mechanism only works at key-value level, you can’t parameterize part of a value. Let’s assume that we are going to deploy the imaginary service to multiple AWS regions for resilience, we’ll have even more combinations. If we have 3 environments, dev
, stage
and prod
(production), and 2 regions, ‘eu-central-1’ and ‘us-west-1’, then we’ll have 3 x 2 = 6 deployment combinations. Even if we use YAML anchor to avoid repeating the when
part, we still repeat a lot of the code:
A YAML anchor
aliases-deployment-triggers:
# The common `when` condition for promoting to stage
- &when_deploy_stage
when:
event:
- promote
environment:
- stage
# The common `when` condition for promoting to prod
- &when_deploy_prod
when:
event:
- promote
environment:
- prod
kind: pipeline
name: default
steps:
# ... omitting some build and test steps for simplicity
- name: deploy_stage_eu
image: node:8.6.0
commands:
# Assume our deploy script takse the parameter env and region
- npm run deploy -- --env=stage --region=eu-central-1
<<: *when_deploy_stage # Using the anchor here
- name: deploy_stage_us
image: node:8.6.0
commands:
- npm run deploy -- --env=stage --region=us-west-1
<<: *when_deploy_stage
- name: deploy_prod_eu
image: node:8.6.0
commands:
- npm run deploy -- --env=prod --region=eu-central-1
<<: *when_deploy_prod
- name: deploy_prod_us
image: node:8.6.0
commands:
- npm run deploy -- --env=prod --region=us-west-1
<<: *when_deploy_prod
Notice that even if we reduce the repetition by when_deploy_stage
for both the stage part, we can’t abstract out the npm run deploy -- --env=<environment> --region=<region>
line and the name, because we can’t parameterize the environment and region bit within the line. The good news is that Jsonnet can solve both problem we discussed. We’ll give a short introduction about Jsonnet and explain how we can solve the problems with Jsonnet.
Jsonnet
Jsonnet is an open source templating language based on JSON. The backbone of it is still native json, but it adds variables, conditionals, functions, arithmetics and more to it. It also has nice linter, formatter, and IDE integrations. It has a nicely designed standard library that provides you utilities for string manipulation, math, and functional tools like map and fold.
A jsonnet source code pass through the compiler, which emits JSON. On MacOS you can easily install it with brew install jsonnet
. Although Drone now natively supports jsonnet, but since my team still runs the old version of Drone, we decided to compile jsonnet to JSON, then use the json2yaml tool to convert it to YAML. We then commit both the jsonnet source and the generated YAML file into our git repository.
Avoiding repetition with functions
So let’s try to solve the repetition problem by using jsonnet functions. The moving parts in our deploy step is the environment and region. So we can define a function that takes the two parameters and do string interpolation in there:
// demo1.jsonnet
local deploy(env, region) =
{
name: 'deploy_%(env)s_%(region)s' % { env: env, region: region },
image: 'node:8.6.0',
commands: [
'npm run deploy -- --env=%(env)s --region=%(region)s' % { env: env, region: region },
],
when: {
event: ['promote'],
environment: [env],
},
};
// Calling the function
{
steps: [
deploy('stage', 'eu-central-1'),
deploy('stage', 'us-west-1'),
deploy('prod', 'eu-central-1'),
deploy('prod', 'us-west-1'),
],
}
Let’s take a closer look to the name
field. Jsonnet supports old Python-like string formatting (the %
operator). In the template string, the %(env)s
will search for the env
key in the object following the %
operator. The s
at the end of the %(...)
means we want to format it as a string.
If we run jsonnet demo1.jsonnet
, this will be printed to the STDOUT:
{
"steps": [
{
"commands": [
"npm run deploy -- --env=stage --region=eu-central-1"
],
"image": "node:8.6.0",
"name": "deploy_stage_eu-central-1",
"when": {
"environment": [
"stage"
],
"event": [
"promote"
]
}
},
{
"commands": [
"npm run deploy -- --env=stage --region=us-west-1"
],
"image": "node:8.6.0",
"name": "deploy_stage_us-west-1",
"when": {
"environment": [
"stage"
],
"event": [
"promote"
]
}
},
{
"commands": [
"npm run deploy -- --env=prod --region=eu-central-1"
],
"image": "node:8.6.0",
"name": "deploy_prod_eu-central-1",
"when": {
"environment": [
"prod"
],
"event": [
"promote"
]
}
},
{
"commands": [
"npm run deploy -- --env=prod --region=us-west-1"
],
"image": "node:8.6.0",
"name": "deploy_prod_us-west-1",
"when": {
"environment": [
"prod"
],
"event": [
"promote"
]
}
}
]
}
We generated 64 lines of json from just 23 lines of jsonnet, and it’s much easier to read!
Define separate pipelines for each scenario
The next question is how can we structure our jsonnet code such that we can easily understand what steps are included in each scenario (e.g. push to non-master, merge to master etc.). We can first define the building blocks, the steps:
local build = {
name: 'build',
image: 'node:8.6.0',
commands: [
'npm install',
'npm run build',
],
};
local unitTest = {
name: 'unit_test',
image: 'node:8.6.0',
commands: [
'npm run unit_test',
],
};
local integrationTest = {
name: 'integration_test',
image: 'node:8.6.0',
commands: [
'npm run integration_test',
],
};
local deploy(env, region) =
{
name: 'deploy_%(env)s_%(region)s' % { env: env, region: region },
image: 'node:8.6.0',
commands: [
'npm run deploy -- --env=%(env)s --region=%(region)s' % { env: env, region: region },
],
};
Then we can start composing our pipelines with these steps. First we define a list of steps we want when pushing to a non-master branch:
local commitToNonMasterSteps = [
build,
unitTest
];
We want to restrict these steps to only run on a push to non-master, we can use a std.map
to add the conditional block (i.e. when
block) to each step of it.
local whenCommitToNonMaster(step) = step {
when: {
event: ['push'],
branch: {
exclude: ['master'],
},
},
};
local commitToNonMasterSteps = std.map(whenCommitToNonMaster, [
build,
unitTest,
]);
The whenCommitToNonMaster
function will append the when
block to the step you pass in. The syntax step { when: ... }
actually means “merging the step
object with the { when: ... }
object”. This function is then applied to each and every step using the std.map
function. This pattern can then be applied to other scenarios, for example when we merge to master:
local whenMergeToMaster(step) = step {
when: {
event: ['push'],
branch: ['master'],
},
};
local mergeToMasterSteps = std.map(whenMergeToMaster, [
build,
unitTest,
deploy('dev', 'eu-central-1'),
deploy('dev', 'us-west-1'),
integrationTest,
]);
We choose to repeat the build
and unitTest
step here, so we can clearly see what is included in the “merge to master” pipeline. In the generated code there will be two copies of the build
step, one with a when
block of pushing to non master and another with a when
block of merging to master; same for the unitTest
step. We can carry on with defining other scenarios and their list of steps. In the end we’ll have a list of scenarios, each contains a list of steps. We can then flatten all the lists into one giant list of all possible steps using the std.flattenArrays()
function.
local pipelines = std.flattenArrays([
commitToNonMasterSteps, // build, unitTest
mergeToMasterSteps // build, unitTest, deploy_dev_eu, deploy_dev_us, integrationTest
deployToStageSteps, // deploy_stage_eu, deploy_stage_us, integrationTest
deployToProdSteps, // deploy_prod_eu, deploy_prod_us, integrationTest
]);
// Below is the actual JSON object that will be emitted by the jsonnet compiler.
{
kind: 'pipeline',
name: 'default',
steps: pipelines,
}
Using this architecture, anyone who reads the Drone configuration can clearly see the list of scenarios (the pipelines
list). To see what steps are executed in each scenario, we can simply go to the definition of the scenario variable (e.g. commitToNonMasterSteps
).
A side note about Jsonnet vs. JavaScript
You might wonder why we choose to use Jsonnet instead of JavaScript. You can easily achieve the same effect by forming the config object in JavaScript and print it out with JSON.stringify()
. One reason is that Jsonnet is natively supported by Drone CI since v1.0, so it make sense to use it directly. Another reason is that Jsonnet’s syntax is built around native JSON, with a relatively limited set of function and operators. So it will force you to focus on the data rather then the algorithm. By using JavaScript you might be tempted to use all sorts of NPM libraries and write complex algorithms that makes it hard to trace and debug. The design of it also leads you to write very functional code instead of procedural code, so if you are into functional programming it will be a natural fit. But technically Jsonnet is no better then using plain JavaScript, so feel free to choose whichever fits into your existing pipeline and team’s expertise.
Conclusions
We discussed the problems for writing the CI pipeline in plain YAML. The first problem is that if we use complex conditionals to control which step to run in which scenario, the pipeline will quickly become hard to reason about. The second problem is that even if we use YAML anchors we still can’t eliminate all the repetitions. By using jsonnet, we can solve the two problems. We can eliminate the repetition using jsonnet functions and string interpolation. To address the complex conditionals problem, we structure our jsonnet code in a way that we enumerate all the steps under each scenario. Thanks to the jsonnet templating, we can be explicit but keep our code concise and clean.