Pull requests

In this post I am collecting pull requests that I author that I enjoyed writing, or I found particularly interesting to work on.

November 2020

November was my last month with GDS. I hope I left things better than I found them, and I know that I learned a lot along the way.

GOV.UK PaaS

November was my last month with GOV.UK PaaS. I spent some time with the irreplaceable @cmcnallygds and she rendered readable my attempt at English words describing isolation segments.

I also added set-pipeline support to pipecleaner which is a tool for linting Concourse pipelines.

GOV.UK Notify

The cell broadcasting work started to move faster, with a spare of PRs:

I’m very excited to see where this notification channel goes. As above, November is my last month with GDS. Notify will continue upwards and to the right, and it has been my privilege to work in the orbit of the core team.

October 2020

GOV.UK PaaS

RDS database users can now get RDS to upgrade their database to the latest minor version, Amazon doesn’t do this for you during the maintenance window unless the minor upgrade is a severe security CVE.

The GOV.UK PaaS metrics exporter is an interesting codebase, and previously I added some not very good integration tests, a pull request using Gomega’s Eventually was overdue.

[BOSH](https://bosh.io] was not cleaning up tasks often enough, and when it did it ran out of memory. After manually clearing out the tasks using the BOSH ruby console, we ensured that BOSH task cleanup runs daily. Bonus points for PR number.

We finally as a team got around to merging our work on isolation segments, and we subsequently enabled egress restricted isolation segments in London for the Document Checking Service.

I finally had enough of manually doing point-in-time restores for our tenants, and I’m sure our tenants were tired of raising support tickets for point-in-time restores. We added point-in-time database restores as a feature using the RDS native feature.

September 2020

GOV.UK PaaS

We allowed users to create read-only bindings to their postgres databases, and enabled the conduit plugin to specify bind parameters.

It is always fun writing RSpec tests for your YAML configuration of your cloud of choice, I added some tests to ensure that all our components are highly available.

August 2020

Cloud Foundry

I had a great deal of fun implementing a feature where a Cloud Foundry operator can customise the error pages within the routing subsystem: gorouter HTML error templates.

A Windows 98 themed Gorouter error page

GOV.UK PaaS

We released autoscaling using the app-autoscaler. It was a great deal of fun to deploy the autoscaler within GOV.UK PaaS and to document how GOV.UK PaaS users can autoscale their apps.

Prometheus is nifty, and I will never pass up a chance to use the predict_linear function to generate alerts.

Ruby is a sharp knife, and has a few ergonomic features that can be dangerous, eg execution strings:

puts `echo hello world`
# is equivalent to
puts %x(echo hello world)

Rubocop is a ruby linter which can be customised. I added a linting rule which rejects code using execution strings dangerously.

July 2020

GOV.UK PaaS

Each GOV.UK PaaS developer has their own development environment, which we endeavour to turn off unless they are needed. The development environments can be quite expensive, so now we use AWS spot instances:

June 2020

GOV.UK Notify incident

Following an incident on GOV.UK Notify I raised the following PRs:

Somewhat related, I paired on a PR to reduce the number of metrics related DNS requests by 10-20x via some short TTL caching. This decreased p90 latency by >1s during peak traffic

GitHub management

I had a lot of fun writing on a Concourse pipeline to help us manage ruby versions across GitHub repos

Concourse

Getting Concourse pipelines to manage themselves is a very useful feature of Concourse. I raised a pull request to allow Concourse’s set-pipeline step to manage pipelines in other teams

March / April / May 2020

GOV.UK PaaS

One of the scariest PRs I’ve raised is a PR which changed how GOV.UK PaaS does automatic certificate rotation. This is related to a PR to ensure our CAs and certificates are generated correctly which was raised due to GOV.UK PaaS’s first P1 incident which was caused by a certificate rotation bug

GOV.UK PaaS have a service broker which provisions CDNs. I raised a PR which demonstrates the potential usability issues of Go’s zero values

A particularly proud moment was when GOV.UK PaaS ran out of IP addresses that were available for provisioning backing services. I rectified this in a pull request to add more CIDR ranges.

After we finished up some work relating to auditing operator actions, I raised a documentation PR about how GOV.UK PaaS’s auditing system works which included some Graphviz graphs that were fun to create

GOV.UK

GOV.UK (the publishing website) has a microservice called router which is written in Go, unlike the rest of GOV.UK which is Ruby. Myself and a couple of colleagues instrumented router using Prometheus metrics. Prometheus’s multi-dimensional queries are very powerful and the metrics are quite useful to measure the reliability of the different GOV.UK “microservices”

January / February 2020

Cloud Foundry

The CF Networking team are a joy to collaborate with, and I think this is very well demonstrated in a PR I raised to extend the Silk CNI to give operators more egress control. This PR was fun to work on technically, because container networking, and because the project team are so friendly, encouraging, and helpful

GOV.UK PaaS

I raised a very mundane PR to configure Prometheus storage retention via BOSH properties. I like this PR because it is a nice demonstration of using RSpec to test YAML

Concourse

In January 2020 Concourse created a new site to curate Concourse resource types. I raised a PR to add the Grafana annotation Concourse resource. This resource type is useful for correlating Concourse pipeline actions to metrics in Grafana.

October / November/ December 2019

GOV.UK PaaS

Over the quieter winter holiday period, I re-implemented a Concourse pipeline linting tool in Go. The previous implementation was a python script that no one in the team understood or looked at. I added Rubocop support, secret redaction support, and terminal colours. Using Go means that it can easily be installed using go get

Cloud Foundry

Cloud Foundry has a service called Gorouter, which routes HTTP traffic. I raised a PR to extend tracing headers to support the W3C trace context standard

I fixed an interesting Gorouter bug which exposed the IP address of the load balancer. Prior to this PR I didn’t know that the Host header was optional in HTTP/1.0

A colleague and I [raised a PR to improve the BOSH vm-strategy documentation[(https://github.com/cloudfoundry/docs-bosh/pull/684) that we found very confusing. I like pairing with technical writers because they always ask you to explain things properly, and then you get better at explaining

GOV.UK

GOV.UK (the publishing website) has a microservice called router which is written in Go, unlike the rest of GOV.UK which is Ruby. Myself and a colleague fixed a dormant bug which was awakened by switching GOV.UK PaaS’s load balancer from an AWS ELB to an ALB. We started using an ALB in HTTPS mode, which allowed us to support HTTP keep-alives. The router app had a zero value for MaxIdleConns and so never cleaned up idle connections, eventually hitting the open file limit.

July/ August / September 2019

Cloud Foundry

I discovered and fixed a strange bug in the BOSH Director API when a VM is in both a VIP and a dynamic network.

A strange side effect of changing BOSH’s config server, is that the SSH fingerprint format can change. This broke the CF conduit plugin, which had to be changed to support both fingerprint formats

GOV.UK PaaS

GOV.UK PaaS brokers relationships between government services and infrastructure providers, part of this process involves currency conversion. I added a metric to our metric microservice which tracks the European Central Bank’s USD to GBP exchange rate

I also added metrics to monitor aggregate user activity:

We upgraded Cloud Foundry via cf-deployment to version 10. We did this via two pull requests and a maintenance window: