This originally appeared on Chris Glass’ blog
Running a mirror for your favorite distribution is much easier than it sounds when leveraging the right tools (and the work of others, including yours truly).
You might be interested to see what we (the Public Cloud team and Web Operations team at Canonical) developed over the years to mirror the Ubuntu archives for our cloud partners (taking a hybrid cache approach).
It’s been available as free software from the start but perhaps lacked a little bit of visibility, not being a top-level project in launchpad.
There are generally two main strategies when it comes to archive mirroring: an actual mirror of everything, or a cache of everything.
The go-to solution for this particular strategy is either to roll your own sync scripts, or use something like ubumirror as I’ve written about in a previous blog post, to rsync all of the archive’s contents to a local disk.
This might be a good solution for places with very limited bandwidth (the initial sync being done with a good old hard drive, for example), but has a few drawbacks:
This strategy usually means running squid or another caching proxy specifically configured to be aware of debian packages and metadata layout. One such project is squid-deb-proxy, that will happily cache debian packages in a sensible way.
More viable than the “full mirror” case, this currently still exhibits the same metadata inconsistency problem as the full mirror option.
In the charm’s case, a hybrid approach was taken: the metadata (roughly, this is everything except pool/) is mirrored locally on a schedule, and only served to clients once its internal consistency has been established. Thus the package indices never produce “hashsum mismatch” errors.
The pool/ part of the archive is then cached using squid3. This means that with default settings disk space requirements are much lower than a full mirror, and the hit rate much better than mirroring everything under the sun (most packages having a very small chance of being used – even more so in a cloud environment).
As an added benefit, running the charm means you’re running the exact same software that our infrastructure team runs in the biggest public clouds in the world – and benefit from the same vigilance and expertise that we apply to our own systems. For free.
For this deployment we’ll first need to configure Juju. If it’s not your first time playing with Juju you can skip right ahead to the juicy bits 🙂
This is intended to kick-start a deployment on a local LXD (as an example), but of course will work on any other cloud supported by juju. For a more detailed introduction to juju please refer to the juju documentation.
To make sure we’re up-to-date, let’s install juju from the snap package:
sudo snap install --classic juju
Juju comes with a “localhost” cloud leveraging LXD, since that is free (both as in speech and as in beer), we’ll use this as a reference cloud, but the instructions should work for any other supported cloud provider (see
# This gives you a local LXD backed juju environment juju bootstrap localhost
Let’s ask our test deployment to only care about xenial, in order to speed up initial sync with the upstream archives and save disk space:
cat > cache.yaml << EOF ubuntu-repository-cache mirror-series: xenial EOF
Each series will download around 2.4Gb of metadata on creation, and then download it again every hour. It keeps at most two copies of the metadata on disk and therefore about 5Gb per series should be planned.
As usual in Juju land, the actual deployment couldn’t be easier!
juju deploy --config=cache.yaml ubuntu-repository-cache juju deploy haproxy juju add-relation ubuntu-repository-cache haproxy juju expose haproxy
If you’re using a local deployment, you can see that juju spawned 3 LXD containers: one juju controller, an HAproxy machine and the archive charm itself.
You can then scale the archive cluster up by simply running
juju add-unit ubuntu-repository-charm
Simply pointing apt at the newly exposed HAproxy (public) IP address should just work!
Here’s an example snippet to add to your sources.list configuration file:
deb http://<HAproxy's IP address>/ubuntu/ xenial main universe
This is the charm that serves the Ubuntu archives for most of the cloud instances you boot on the major cloud providers. We have one deployment per cloud region, and make sure cloud-init sets the default ubuntu archive’s address to them when relevant.
For a production deployment we use at least 2 ubuntu-repository-cache instances (juju deploy -n 2 ubuntu-repository-cache) behind 2 HAproxy instances (juju deploy -n 2 haproxy) that are balanced with DNS round-robin.
The squid cache space is computed based on available memory and disk space.
On a machine with 12Gb of RAM and 300Gb of root disk, the following usage is observed: ~200Gb of disk space dedicated to package caching, plus about 22Gb of disk for the default series metadata (the default behavior, in other words, what you get without passing a configuration file when deploying).
You can replicate this setup easily with the following deployment command:
juju deploy --constraints "mem=12G, root-disk=300G" ubuntu-repository-cache
This unfortunately doesn’t work with the local LXD provider right now, but should work with most other cloud options offered by juju.
At the time of writing, the local substrate does not honour disk size constraints unfortunately, so all LXD containers are created with a root disk of 10Gb regardless of what is specified. This only applies to the LXD substrate however, and I’m sure the problem will be fixed in a future version.
Don’t hesitate to leave a comment on Reddit!
Ubuntu offers all the training, software infrastructure, tools, services and support you need for your public and private clouds.
Bid “bonjour” to our Bionic Beaver! Along with a sneak preview of our official Bionic mascot, it’s a short update this week as we’re all heads-down in bug fixing mode. There are a couple of links to check out if…
Development (18.04) https://wiki.ubuntu.com/Bionic... On the road to 18.04 we have a 4.15 based kernel in the Bionic repository. Important upcoming dates: Final Beta – Apr 5 (~2 weeks away) Kernel Freeze –…
Speaker: Stephan Fabel, Arturo Suarez Date/Time: February 21, 2018 at 12PM EST / 5PM GMT OpenStack has often been positioned as an alternative to traditional proprietary virtualization environments. Join Arturo Suarez and Stephan Fabel…