Pull requests
In this post I am collecting pull requests that I author that I enjoyed writing, or I found particularly interesting to work on.
November 2020
November was my last month with GDS. I hope I left things better than I found them, and I know that I learned a lot along the way.
GOV.UK PaaS
November was my last month with GOV.UK PaaS. I spent some time with the irreplaceable @cmcnallygds and she rendered readable my attempt at English words describing isolation segments.
I also
added set-pipeline
support
to pipecleaner
which is a tool for linting Concourse pipelines.
GOV.UK Notify
The cell broadcasting work started to move faster, with a spare of PRs:
I’m very excited to see where this notification channel goes. As above, November is my last month with GDS. Notify will continue upwards and to the right, and it has been my privilege to work in the orbit of the core team.
October 2020
GOV.UK PaaS
RDS database users can now get RDS to upgrade their database to the latest minor version, Amazon doesn’t do this for you during the maintenance window unless the minor upgrade is a severe security CVE.
The GOV.UK PaaS metrics exporter is an interesting codebase, and previously I
added some not very good integration tests,
a pull request using Gomega’s Eventually
was overdue.
[BOSH](https://bosh.io] was not cleaning up tasks often enough, and when it did it ran out of memory. After manually clearing out the tasks using the BOSH ruby console, we ensured that BOSH task cleanup runs daily. Bonus points for PR number.
We finally as a team got around to merging our work on isolation segments, and we subsequently enabled egress restricted isolation segments in London for the Document Checking Service.
I finally had enough of manually doing point-in-time restores for our tenants, and I’m sure our tenants were tired of raising support tickets for point-in-time restores. We added point-in-time database restores as a feature using the RDS native feature.
September 2020
GOV.UK PaaS
We allowed users to create read-only bindings to their postgres databases, and enabled the conduit plugin to specify bind parameters.
It is always fun writing RSpec tests for your YAML configuration of your cloud of choice, I added some tests to ensure that all our components are highly available.
August 2020
Cloud Foundry
I had a great deal of fun implementing a feature where a Cloud Foundry operator can customise the error pages within the routing subsystem: gorouter HTML error templates.
GOV.UK PaaS
We released autoscaling using the app-autoscaler. It was a great deal of fun to deploy the autoscaler within GOV.UK PaaS and to document how GOV.UK PaaS users can autoscale their apps.
Prometheus is nifty, and I will never pass up a chance to
use the predict_linear
function to generate alerts.
Ruby is a sharp knife, and has a few ergonomic features that can be dangerous, eg execution strings:
puts `echo hello world`
# is equivalent to
puts %x(echo hello world)
Rubocop is a ruby linter which can be customised. I added a linting rule which rejects code using execution strings dangerously.
July 2020
GOV.UK PaaS
Each GOV.UK PaaS developer has their own development environment, which we endeavour to turn off unless they are needed. The development environments can be quite expensive, so now we use AWS spot instances:
- Fix BOSH instance creation when using tags
- Resurrect instances that are unresponsive
- Use spot instances in development environments
June 2020
GOV.UK Notify incident
Following an incident on GOV.UK Notify I raised the following PRs:
- alphagov/paas-cf - horizontally scale scheduler VM - an explanation of the incident
- cloudfoundry/cf-networking-release - bosh-dns-adapter/sdcclient: delay retries - the importance of retry logic
Somewhat related, I paired on a PR to reduce the number of metrics related DNS requests by 10-20x via some short TTL caching. This decreased p90 latency by >1s during peak traffic
GitHub management
I had a lot of fun writing on a Concourse pipeline to help us manage ruby versions across GitHub repos
Concourse
Getting Concourse pipelines to manage themselves is a very useful feature of Concourse.
I raised
a pull request to allow Concourse’s set-pipeline
step to manage pipelines in other teams
March / April / May 2020
GOV.UK PaaS
One of the scariest PRs I’ve raised is a PR which changed how GOV.UK PaaS does automatic certificate rotation. This is related to a PR to ensure our CAs and certificates are generated correctly which was raised due to GOV.UK PaaS’s first P1 incident which was caused by a certificate rotation bug
GOV.UK PaaS have a service broker which provisions CDNs. I raised a PR which demonstrates the potential usability issues of Go’s zero values
A particularly proud moment was when GOV.UK PaaS ran out of IP addresses that were available for provisioning backing services. I rectified this in a pull request to add more CIDR ranges.
After we finished up some work relating to auditing operator actions, I raised a documentation PR about how GOV.UK PaaS’s auditing system works which included some Graphviz graphs that were fun to create
GOV.UK
GOV.UK (the publishing website) has
a microservice called router
which is written in Go, unlike the rest of GOV.UK which is Ruby.
Myself and a couple of colleagues
instrumented router
using Prometheus metrics.
Prometheus’s multi-dimensional queries are very powerful and the metrics are
quite useful to measure the reliability of the different GOV.UK “microservices”
January / February 2020
Cloud Foundry
The CF Networking team are a joy to collaborate with, and I think this is very well demonstrated in a PR I raised to extend the Silk CNI to give operators more egress control. This PR was fun to work on technically, because container networking, and because the project team are so friendly, encouraging, and helpful
GOV.UK PaaS
I raised a very mundane PR to configure Prometheus storage retention via BOSH properties. I like this PR because it is a nice demonstration of using RSpec to test YAML
Concourse
In January 2020 Concourse created a new site to curate Concourse resource types. I raised a PR to add the Grafana annotation Concourse resource. This resource type is useful for correlating Concourse pipeline actions to metrics in Grafana.
October / November/ December 2019
GOV.UK PaaS
Over the quieter winter holiday period,
I re-implemented a Concourse pipeline linting tool in Go.
The previous implementation was a python script that no one in the team understood or looked at.
I added Rubocop support, secret redaction support, and terminal colours.
Using Go means that it can easily be installed using go get
Cloud Foundry
Cloud Foundry has a service called Gorouter, which routes HTTP traffic. I raised a PR to extend tracing headers to support the W3C trace context standard
I fixed an interesting Gorouter bug which exposed the IP address of the load balancer. Prior to this PR I didn’t know that the Host header was optional in HTTP/1.0
A colleague and I [raised a PR to improve the BOSH vm-strategy documentation[(https://github.com/cloudfoundry/docs-bosh/pull/684) that we found very confusing. I like pairing with technical writers because they always ask you to explain things properly, and then you get better at explaining
GOV.UK
GOV.UK (the publishing website) has
a microservice called router
which is written in Go, unlike the rest of GOV.UK which is Ruby.
Myself and a colleague fixed a dormant
bug which was awakened by
switching GOV.UK PaaS’s load balancer from an AWS ELB to an ALB. We started
using an ALB in HTTPS mode, which allowed us to support HTTP keep-alives. The
router app had a zero value for MaxIdleConns
and so never cleaned up idle
connections, eventually hitting the open file limit.
July/ August / September 2019
Cloud Foundry
I discovered and fixed a strange bug in the BOSH Director API when a VM is in both a VIP and a dynamic network.
A strange side effect of changing BOSH’s config server, is that the SSH fingerprint format can change. This broke the CF conduit plugin, which had to be changed to support both fingerprint formats
GOV.UK PaaS
GOV.UK PaaS brokers relationships between government services and infrastructure providers, part of this process involves currency conversion. I added a metric to our metric microservice which tracks the European Central Bank’s USD to GBP exchange rate
I also added metrics to monitor aggregate user activity:
- how many users have logged in in the last 30 days
- how many users do we have for each identity provider
We upgraded Cloud Foundry via cf-deployment to version 10. We did this via two pull requests and a maintenance window: