For several months, I’ve been working with the maintainers of OCaml.org projects to define and document the governance structure around the domain name. I wrote about this previously and I’m pleased to say that the work for this phase has concluded, with the document now live.
There were some recurring themes that cropped up during my email discussions with people and I thought it would be useful to present a summary of them, along with my thoughts. Broadly, the discussions revolved around the philosophy of the document, the extent of its scope, and the depth of coverage. This discourse was very important for refining and improving the document.
Some of the comments I received were essentially that the document did not represent how we should be organising ourselves. There was occasionally the sense (to me at least) that the only appropriate form of governance is a fully democratic and representational one.
That would entail things like official committees, ensuring that various communities/organisations were represented, and perhaps establishing some form of electoral processes. Overall, something relatively formal and quite carefully structured. Of course, instituting such an arrangement would necessarily require somewhat involved procedures, documentation, and systems — as well as the volunteer time to manage those processes.
These may be noble aims — and I expect one day we’ll be closer to such ideals — but one of the critical factors for the current approach was that we record how things are right now. In my experience, anything else is purely aspirational and therefore would have little bearing with how things currently function.
To put it another way, the current document must not describe the structure we desire to have, but the organisation we actually have — warts and all. Yes, right now we have a BDFL*, who personally owns the domain and therefore can do as he pleases with it. Irrespective of this, the community has been able to come together, coordinate themselves, and build very useful things around the domain name. This has happened independently of any formal community processes and, in my view, has largely been driven by people supporting each other’s works and generally trying to ‘do the right thing’.
Another aspect to point out is that is that such documents and procedures are not necessary for success. This is obvious when you consider how far the OCaml community has come in such a relatively short space of time. Given this, one might argue why we need any kind of written governance at all.
To answer that, I would say that once things grow beyond a certain scale, I believe it helps to gather the implicit behaviours and document them clearly. This allows us to be more systematic in our approach and also enables newcomers to understand how things work and become involved more quickly. In addition, having a clear record of how things operate in the present is an invaluable tool in helping to clarify what exactly we should work on changing for the future.
It’s a little confusing to consider that ‘OCaml.org’ is simultaneously a collection of websites, infrastructural components, and projects. Disambiguating these from the wider OCaml community was important, and relatively straightforward, but there were a few questions about the relationship between the domain name and the projects that use it.
Although the governance covers the OCaml.org domain name, it necessarily has an impact on the projects which make use of it. This matters because anything under the OCaml.org domain will, understandably, be taken as authoritative by users at large. In a way, OCaml.org becomes the sum of the projects under it, hence it’s necessary to have some lightweight stipulations about what is expected of those projects.
Projects themselves are free to organise as they wish (BDFL/Democracy/etc) but there are certain guiding principles for OCaml.org that those projects are expected to be compatible with (e.g. openness, community-related, comms, etc). These stipulations are already met by the current projects, so codifying them is intended to clarify expectations for new projects.
Another of the recurring points was how the current document didn’t capture every eventuality. Although I could have attempted this, the end result would have been a lengthy document, full of legalese, that I expect very few people would ever read. The document would also have needed to cover eventualities that have not occurred (yet) and/or may be very unlikely to occur.
Of course, this is not a legal document. No-one can be compelled to comply with it and there are very few sanctions for anyone who chooses not to comply. However, for those who’ve agreed to it, acceptance signals a clear intent to take part in a social contract with the others involved in work around the domain name.
Overall, I opted for a lightweight approach that would cover how we typically deal with issues and result in a more readable document. Areas that are ‘unchartered’ for us should be dealt with as they have been so far — through discussion and action — and can subsequently be incorporated when we have a better understanding of the issues and solutions.
The current version of the governance document is now live and it is very much intended to be a living document, representing where we are now. As the community continues to grow and evolve, we should revisit this to ensure it is accurate and is meeting our needs.
I look forward to seeing where the community takes it!
In case you’re interested, the set of links below covers the journey from beginning to end of this process.
* Yeah, I made sure to add Xavier to the BDFL list before publishing this. :)
Thanks to Ashish, Philippe and Anil for comments on an earlier draft.Share / Comment
Updated: 14 July (see below)
Above are my slides from a talk at PolyConf this year. I was originally going to talk about the MISO tool stack and personal clouds (i.e. how we’ll build towards Nymote) but after some informal conversations with other speakers and attendees, I thought it would be way more useful to focus the talk on unikernels themselves — specifically, the ‘M’ in MISO. As a result, I ended up completely rewriting all my slides! Since I pushed this post just before my talk, I hope that I’m able to stick to the 30min time slot (I’ll find out very soon).
In the slides I mention a number of things we’ve done with MirageOS so I thought it would be useful to list them here. If you’re reading this at the conference now, please do give me feedback at the end of my talk!
The mirage-skeleton repo, which has a number of examples
To get involved in the development work, please do join the MirageOS devel list and try out some of the examples for yourselves!
The video of the talk is now available and it’s embedded below. Overall, the talk seemed to go well and there was enough time for questions.
At the end of the talk, I asked people to give me feedback and shared a URL, where I had a very short form. I had 21 responses with a rating of 4.52/5.00. I’m quite pleased with this and the feedback was also useful. In a nutshell, the audience seemed to really appreciate the walkthrough (which encourages me to make some screencasts). There was one comment that I didn’t do enough justice to the security benefits. Specifically, I could have drawn more reference to the OCaml TLS work, which prevents bugs like heartbleed. Security is definitely one of the key benefits of MirageOS unikernels (see here), so I’ll do more to emphasise that next time.
Here’s the video and I should mention that the slides seem to be a few seconds ahead. You’ll notice that I’ve left the feedback link live, too. If you’d like to tell me what you think of the talk, please do so! There are some additional comments at the end of this post.
Finally, here are few things I should clarify:
In the previous post I described the continuous end-to-end system
that we’ve set up for some of the MirageOS projects — automatically going from
git push all the way to live deployment, with everything under
Everything I described previously already exists and you can set up the workflow for yourself, the same way many others have done with the Travis CI scripts for testing/build. However, there are a range of exciting possibilities to consider if we’re willing to extrapolate just a little from the tools we have right now. The rest of this post explores these ideas and considers how we might extend our system.
Previously, we had finished the backbone of the workflow and I discussed a few ideas about how we should flesh it out — namely more testing and some form of logging/reporting. There’s substantially more we could do when we consider how lean and nimble unikernels are, especially if we speculate about the systems we could create as our toolstack matures. A couple of things immediately come to mind.
The first is the ability to boot a unikernel only when it is required, which opens up the possibility of highly-elastic infrastructure. The second is the ease with which we can push, pull or otherwise distribute unikernels throughout a system, allowing new forms of deployment to both cloud and embedded systems. We’ll consider these in turn and see where they take us, comparing with the current ‘mirage-decks’ deployment I described in Part 1.
The way cloud services are currently provisioned means that you may have services operating and consuming resources (CPU, memory, etc), even when there is no demand for them. It would be significantly more efficient if we could just activate a service when required and then shut it down again when the demand has passed. In our case, this would mean that when a unikernel is ‘deployed to production’, it doesn’t actually have to be live — it merely needs to be ready to boot when demand arises. With tools like Jitsu (Just-In-Time Summoning of Unikernels), we can work towards this kind of architecture.
Jitsu allows us to have unikernels sitting in storage then ‘summon’ them into existence. This can occur in response to an incoming request and with no discernible latency for the requester. While unikernels are inactive, they consume only the actual physical storage required and thus do not take up any CPU cycles, nor RAM, etc. This means that more can be achieved with fewer resources and it would significantly improve things like utilization rates of hardware and power efficiency.
In the case of the decks.openmirage.org unikernel that I discussed last time, it would mean that the site would only come online if someone had requested it and would shut down again afterwards.
In fact, we’ve already been working on this kind of system and Jitsu will be presented at NSDI in Oakland, California this May. In the spirit of looking ahead, there’s more we could do.
At the moment, Jitsu lets you set up a system where unikernels will boot in response to incoming requests. This is already pretty cool but we could take this a step further. If we can boot unikernels on demand, then we could use that to build a system which can automate the scale-out of those services to match demand. We could even have that system work across multiple machines, not just one host. So how would all this look in practice for ‘mirage-decks’?
Our previous toolchain automatically boots the new unikernel as soon as it is pulled from the git repo. Using Jitsu, our deployment machine would pull the unikernel but leave it in the repo — it would only be activated when someone requests access to it. Most of the time, it may receive no traffic and therefore would remain ‘turned off’ (let’s ignore webcrawlers for now). When someone requests to see a slide deck, the unikernel would be booted and respond to the request. In time it can be turned off again, thus freeing resources. So far, so good.
Now let’s say that a certain slide deck becomes really popular (e.g. posted to HackerNews or Reddit). Suddenly, there are many incoming requests and we want to be able to serve them all. We can use the one unikernel, on one machine, until it is unable to handle the load efficiently. At this point, the system can create new copies of that unikernel and automatically balance across them. These unikernels don’t need to be on the same host and we should be able to spin them up on different machines.
To stretch this further, we can imagine coordinating the creation of those new unikernels nearer the source of that demand, for example starting off on a European cloud, then spinning up on the East coast US and finally over to the West coast of the US. All this could happen seamlessly and the process can continue until the demand passes or we reach a predefined limit — after all, given that we pay for the machines, we don’t really want to turn a Denial of Service into a Denial of Credit.
After the peak, the system can automatically scale back down to being largely dormant — ready to react when the next wave of interest occurs.
If you think this is somewhat fanciful, that’s perfectly understandable — as I mentioned previously, this post is very much about extrapolating from where the tools are right now. However, unikernels actually make it very easy to run quick experiments which indicate that we could iterate towards what I’ve described.
A recent and somewhat extreme experiment ran a unikernel VM for each URL. Every URL on a small static site was served from its own, self-contained unikernel, complete with it’s own web server (even the ‘rss.png’ icon was served separately). You can read the post to see how this was done and it also led to an interesting discussion on the mailing list (e.g. if you’re only serving a single item, why use a web server at all?). Of course, this was just an experiment but it demonstrates what is possible now and how we can iterate, uncover new problems, and move forward. One such question is how to automatically handle networking during a scale-out, and this is an area were tools like Signpost can be of use.
Overall, the model I’ve described is quite different to the way we currently use the cloud, where the overhead of a classic OS is constantly consuming resources. Although it’s tempting to stick with the same frame of reference we have today we should recognise that the current model is inextricably intertwined with the traditional software stacks themselves. Unikernels allow completely new ways of creating, distributing and managing software and it takes some thought in order to fully exploit their benefits.
For example, having a demand-driven system means we can deliver more services from just the one set of physical hardware — because not all those services would be consuming resources at the same time. There would also be a dramatic impact on the economics, as billing cycles are currently measured in hours, whereas unikernels may only be active for seconds at a time. In addition to these benefits, there are interesting possibilities in how such scale-outs can be coordinated across different devices.
As we move to a world with more connected devices, the software and services we create will have to operate across both the cloud and embedded systems. There have been many names for this kind of distributed system, ranging from ubiquitous computing to dust clouds and the ‘Internet of Things’ but they all share the same idea of running software at the edges of the network (rather than just cloud deployments).
When we consider the toolchain we already have, it’s not much of a stretch to imagine that we could also build and store a unikernel for ARM-based deployments. Those unikernels can be deployed onto embedded devices and currently we target the Cubieboard2.
We could make such a system smarter. Instead of having the edge devices constantly polling for updates, our deployment process could directly push the new unikernels out to them. Since these devices are likely to be behind NATs and firewalls, tools like Signpost could deal with the issue of secure connectivity. In this way, the centralized deployment process remains as a coordination point, whereas most of the workload is dealt with by the devices the unikernels are running on. If a central machine happens to be unavailable for any reason, the edge-devices would continue to function as normal. This kind of arrangement would be ideal for Internet-of-Things style deployments, where it could reduce the burden on centralised infrastructure while still enabling continuous deployment.
In this scenario, we could serve the traffic for ‘mirage-decks’ from a unikernel on a Cubieboard2, which could further minimise the cost of running such infrastructure. It could be configured such that if demand begins to peak, then an automated scale-out can occur from the Cubieboard2 directly out onto the public cloud and/or other Cubieboards. Thus, we can still make use of third-party resources but only when needed and of the kind we desire. Of course, running a highly distributed system leads to other needs.
When running services at scale it becomes important to track the activity and understand what is taking place in the system. In practice, this means logging the activity of the unikernels, such as when and where they were created and how they perform. This becomes even more complex for a distributed system.
If we also consider the logging needs of a highly-elastic system, then another problem emerges. Although scaling up a system is straightforward to conceptualise, scaling it back down again presents new challenges. Consider all the additional logs and data that have been created during a scale-out — all of that history needs to be merged back together as the system contracts. To do that properly, we need tools designed to manage distributed data structures, with a consistent notion of merges.
Irmin addresses these kinds of needs and it enables a style of programming very similar to the Git workflow, where distributed nodes fork, fetch, merge and push data between each other. Building an end-to-end logging system with Irmin would enable data to be managed and merged across different nodes and keep track of activity, especially in the case of a scale down. The ability to capture such information also means the opportunity to provide analytics to the creators of those unikernels around performance and usage characteristics.
The use of Irmin wouldn’t be limited to logging as the unikernels themselves could use it for managing data in lieu of other file systems. I’ll refrain from extrapolating too far about this particular tool as it’s still under rapid development and we’ll write more as it matures.
You may have noticed that one of the benefits of the unikernel approach arises because the artefacts themselves are not altered once they’re created. This is in line with the recent resurgence of ideas around ‘immutable infrastructure’. Although there isn’t a precise definition of this, the approach is that machines are treated as replaceable and can be regularly re provisioned with a known state. Various tools help the existing systems to achieve this but in the case of unikernels, everything is already under version control, which makes managing a deployment much easier.
As our approach is already compatible with such ideas, we can take it a step further. Immutable infrastructure essentially means the artefact produced doesn’t matter. It’s disposable because we have the means to easily recreate it. In our current example, we still ship the unikernel around. In order to make this ‘fully immutable’, we’d have to know the state of all the packages and code used when building the unikernel. That would give us a complete manifest of which package versions were pulled in and from which sources. Complete information like this would allow us to recreate any given unikernel in a highly systematic way. If we can achieve this, then it’s the manifest which generates everything else that follows.
In this world-view, the unikernel itself becomes something akin to caching. You use it because you don’t want to rebuild it from source — even though unikernels are quicker to build than a whole OS/App stack. For more security critical applications, you may want to be assured of the code that is pulled in, so you examine the manifest file before rebuilding for yourself. This also allows you to pin to specific versions of libraries so that you can explicitly adjust the dependencies as you wish. So how do we encode the manifest? This is another area where Irmin can help as it can keep track of the state of package history and can recreate the environment that existed for any given build run. That build run can then be recreated elsewhere without having to manually specify package versions.
There’s a lot more to consider here as this kind of approach opens up new avenues to explore. For the time being, we can recognise that the unikernel approach lends itself to the achieving immutable infrastructure.
As I mentioned at the beginning of this post, most of what I’ve described is speculative. I’ve deliberately extrapolated from where the tools are now so as to provoke more thoughts and discussion about how this new model can be used in the wild. Some of the things we’re already working towards but there are many other uses that may surprise us — we won’t know until we get there and experimenting is half the fun.
We’ll keep marching on with more libraries, better tooling and improving quality. What happens with unikernels in the rest of 2015 is largely up to the wider ecosystem.
That means you.
Thanks to Thomas Gazagnaire and Richard Mortier for comments on an earlier draft.Share / Comment
In my Jekyll to Unikernel post, I described an automated workflow that would take your static website, turn it into a MirageOS unikernel, and then store that unikernel in a git repo for later deployment. Although it was written from the perspective of a static website, the process was applicable to any MirageOS project. This post covers how things have progressed since then and the kind of automated, end-to-end deployments that we can achieve with unikernels.
If you’re already familiar with the above-linked post then it should be clear that this will involve writing a few more scripts and ensuring they’re in the right place. The rest of this post will go through a real world example of such an automated system, which we’ve set up for building and deploying the unikernel that serves our slide decks — mirage-decks. Once you’ve gone though this post, you should be able to recreate such a workflow for your own needs. In Part 2 of this series I’ll build on this post and consider what the possibilities could be if we extended the system using some of our other tools — thus arriving at something very much like our own Heroku for Unikernels.
Almost all of our OCaml projects now use Travis CI for build and testing (and deployment). In fact, there are so many libraries now that we recently put together an OCaml Travis Skeleton, which means we don’t have to manually keep the scripts in sync across all our repos — and fewer copy/paste/edits means fewer mistakes.
If you’re familiar with the build scripts from last time, then
you can browse the new scripts and you’ll see that they’re broadly similar.
In many cases you may well be able to depend on one or other of the scripts
directly and for a handful of scenarios, you can fork and patch them to
suit you (i.e. for MirageOS unikernels). We can do this because we’ve made it
quick to set up an OCaml environment using an Ubuntu PPA. The rest
of the work is done by the
mirage tool itself so once that’s installed, the
build process becomes fairly straightforward. The complexity around secure
keys was also covered last time, which allowed us to commit the
final unikernel to a deployment repo. That means the remaining step is
to automate the deployment itself.
Committing the unikernel to a deployment repo is where the previous post ended
and a number of people forged ahead and wrote about their
experiences deploying onto AWS and Linode. Many of these deployments
(understandably) involve a number of quite manual steps. It would be
particularly useful to construct a set of scripts that can be fully automated,
such that a
git push to a repo will automatically run through the cycle of
building, testing, storing and activating a new unikernel. We’ve done
exactly this with some of our repos and this post will talk through those
MirageOS unikernels can currently be built for Xen and Unix backends. This is a straightforward step and typically the build matrix is already set up to test that both of them build as expected. For this post, I’ve only considered the Xen backend as that’s our chosen deployment method but it would be equally feasible to deploy the unix-based unikernels onto a *nix machine in much the same way. In this sense, you get to choose whether you want to deploy the unikernels onto a Hypervisor (for isolation and security) or whether running them as unix-processes better suits your needs. The unikernel approach means that both options are open to you, with little more than a command-line flag between them.
In terms of the deployment machines there are several options to consider. The most obvious is to set up a dedicated host, where you have full access to the machine and can install Xen. Another is to have a machine running on EC2 and create scripts to deal with unikernels. You could also build and deploy onto Xen on the Cubieboard2. If you’d rather test out the complete system first, you could set up an appropriate machine in Virtualbox to work with.
For our workflow, we use Xen unikernels which we deploy to a dedicated host. For the sake of brevity, I won’t go into the details of how to set up the machine but you can follow the instructions linked above.
Decks is the source repo that holds many of our slides, which we’ve presented at conferences and events over the years (I admit that I have yet to add mine). The repo compiles to a unikernel that can then serve those slides, as you see at decks.openmirage.org. For maximum fun-factor, we usually run that unikernel from a Cubieboard2 when giving talks.
The toolchain for this unikernel includes build, store and deploy. We’ll recap the first two steps before going through the final one.
Build — In the root of the decks source repo, you’ll notice the
.travis.yml file, which fetches the standard build script mentioned earlier.
Building the unikernel proceeds according to the options in the build matrix.
language: c install: wget https://raw.githubusercontent.com/ocaml/ocaml-travisci-skeleton/master/.travis-mirage.sh script: bash -ex .travis-mirage.sh env: matrix: - OCAML_VERSION=4.02 MIRAGE_BACKEND=unix MIRAGE_NET=socket - OCAML_VERSION=4.02 MIRAGE_BACKEND=unix MIRAGE_NET=direct - OCAML_VERSION=4.02 MIRAGE_BACKEND=xen MIRAGE_ADDR="184.108.40.206" MIRAGE_MASK="255.255.255.128" MIRAGE_GWS="220.127.116.11" DEPLOY=1 global: - secure: ".... encrypted data ...." - secure: ".... encrypted data ...." - secure: ".... encrypted data ...." ...
In this case, two builds occur for Unix and one for Xen with different parameters being used for each. If you look at the actual travis file, you’ll notice there are 26 lines of encrypted data. This is how we pass the deployment key to Travis CI, so that it has push access to the separate mirage-decks-deployment repo. You can read the section in the previous post to see how we send Travis a private key.
Store — One of the combinations in the build matrix (configured for Xen), is intended for deployment. When that unikernel is completed, an additional part of the script is triggered that pushes it into the deployment repo.
After the ‘build’ and ‘store’ steps above, we have a
deployment repository with a collection of Xen unikernels. For
this stage, we have a new set of scripts that live in this repo alongside those
unikernels. Specifically, you’ll notice a folder called
contains four files.
. ├── Makefile ├── README.md ├── scripts │ ├── crontab │ ├── deploy.sh │ ├── install-hooks.sh │ └── post-merge.hook ...
A quick summary of the setup is that we clone the repo onto our deployment
machine and install some hooks there. Then a simple cronjob will perform
git pull at regular intervals. If a merge event occurs, then it means the
repo has been updated and another script is triggered. That script removes the
currently running unikernel and boots the latest version from the repo. It’s
fairly straightforward and I’ll explain what each of the files does below.
Makefile - After cloning the repo, run
make install. This will trigger
install-hooks.sh to set things up appropriately. It’s worth remembering that
from this point on, the git repo on the deployment machine will not be
identical to the deployment repo on GitHub.
install-hooks.sh — The first two lines ensure that the commands
will be run from the root of the git repo. The third line symlinks the
post-merge.hook file into the appropriate place within the
This is the folder where customized git hooks need to be placed in
order to work. The final line adds the file
scripts/crontab to the
deployment machine’s list of cron jobs.
ROOT=$(git rev-parse --show-toplevel) # obtain path to root of repo cd $ROOT # symlink the post-merge.sh file into the .git/hooks folder ln -sf $ROOT/scripts/post-merge.hook $ROOT/.git/hooks/post-merge crontab scripts/crontab # add to list of cron jobs
crontab — This file is a cronjob that sets up the deployment machine to
git pull on the deployment repo at regular intervals. Changing the
file in the repo will ultimately cause it to be updated on the deployment
deploy.sh). At the moment, it’s set to run every 11 minutes.
*/11 * * * * cd $HOME/mirage-decks-deployment && git pull
post-merge.hook — Since we’ve already run the Makefile, this file is
symlinked from the appropriate place on the deployment machine’s copy of the
repo. When a
git pull results in new commits being downloaded and merged,
then this script is triggered immediately afterwards. In this case, it just
ROOT=$(git rev-parse --show-toplevel) # obtain path to root of repo exec $ROOT/scripts/deploy.sh # execute the deploy script
deploy.sh — This is where the work actually happens and you’ll notice that there really isn’t much to do! I’ve commented in the code below to explain what’s going on.
VM=mir-decks XM=xm ROOT=$(git rev-parse --show-toplevel) cd $ROOT crontab scripts/crontab # Update cron scripts # Identify the latest build in the repo and then use # the generic Xen config script to construct a # specific file for this unikernel. Essentially, # 'sed' just does a find/replace on two elements and # the result is written to a new file. # KERNEL=$ROOT/xen/`cat xen/latest` sed -e "s,@VM@,$VM,g; s,@KERNEL@,$KERNEL/$VM.xen,g" \ < $XM.conf.in \ >| $KERNEL/$XM.conf # Move into the folder with the latest unikernel. # Remove any uncompressed Xen images found there # (since we may be starting a rebuilt unikernel). # Unzip the compressed unikernel. # cd $KERNEL rm -f $VM.xen bunzip2 -k $VM.xen.bz2 # Instruct Xen to remove the currently running # unikernel and then start up the new one we # just unzipped. # sudo $XM destroy $VM || true sudo $XM create $XM.conf
At this point, we now have a complete system! Of course, this arrangement isn’t perfect and there are number of things we could improve. For example, it depends on a cron job, which means it may take a while before a new unikernel is live. Replacing this with something triggered on a webhook could be an improvement, but it does mean exposing an end-point to the internet. The scripts will also redeploy the current unikernel, even if the only change is to the crontab schedule. Some extra work in the deploy script, using some git tools, might work around this.
Despite these minor issues, we do have a completely end-to-end workflow that takes us all the way from pushing some new changes to deploying a new unikernel! An additional feature is that everything is checked into version control. Right from the scripts to completed artefacts (including a method of transmitting secure keys/data, over public systems).
There is minimal work done outside the code you’ve already seen, though there is obviously some effort involved in setting up the deployment machine. However, as mentioned earlier, you could either use the unix-based unikernels or experiment with Virtualbox VM with Xen just to test out this entire toolchain.
Overall, we’ve only added around 20 lines of code to the initial 50 or so that
we use for the Travis CI build. So for less than 100 lines of code, we have
a complete end-to-end system that can take a MirageOS project from a
git push, all the way through to a live deployment.
In our current system, if the unikernel builds appropriately then we just assume it’s ok to deploy to production. Fire and forget! What could possibly go wrong! Of course, this is a somewhat naive approach and for any critical system it would be better to hook in some additional things.
One obvious improvement would be to introduce a more thorough testing regimen, which would include running unit tests as part of the build. Indeed, various libraries in the MirageOS project are already moving towards this model (e.g see the notes for links).
It’s even possible to go beyond unit tests and introduce more functional/systems/stress testing on the complete unikernel before permitting deployment. This would help to surface any wider issues as services interact and we could even simulate network conditions — achieving something like ‘staging on steroids’.
The scenario we have above also assumes that things work smoothly and nobody needs to know anything. It would be useful to hook in some form of logging and reporting, such that when a new unikernel is deployed a notification can be sent/stored somewhere. In the short term, there are likely existing tools and ways of doing this so it would be a matter of putting them together.
Overall, with the above model, we can easily set up a system where we go from writing code, to testing it via CI, to deploying it to a staging server for functional tests, and finally pushing it out into live deployment. All of this can be done with a few additional scripts and minimal interaction from the developer. We can achieve this because we don’t have to concern ourselves with large blobs of code, multiple different systems and keeping environments in sync. Once we’ve built the unikernel, the rest almost becomes trivial.
This is close enough for me to declare it as a ‘Heroku for unikernels’ but obviously, there’s much more we can (and should) do with such a system. If we extrapolate just a little from where we are now, there are a range of exciting possibilities to consider in terms of automation, scalability and distributed systems. Especially if we incorporate other aspects of the toolstack we’re working towards.
Part 2 of this series is where I’ll consider these possibilities, which will be more speculative and less constrained. It will cover the kinds of systems we can create once the tools are more mature and will touch on ideas around hyper-elastic clouds, embedded systems and what this means for the concept of immutable infrastructure.
Since we already have the ‘backbone’ of the toolchain in place, it’s easier to see where it can be extended and how.
Edit: The second part of this series is now up - “Self Scaling Systems”
Thanks to Anil Madhavapeddy and Thomas Leonard for comments on an earlier draft and Richard Mortier for his work on the deployment toolchain.Share / Comment
Last summer we announced the beta release of a clean-slate implementation of TLS in pure OCaml, alongside a series of blog posts that described the libraries and the thinking behind them. It took two hackers six months — starting on the beach — to get the stack to that point and their demo server is still going strong. Since then, the team has continued working and recently presented at the 31st Chaos Communication Congress.
The latest example goes quite a bit further than a server that just displays the handshake. This time, the team have constructed a Xen unikernel that’s holding a private key to a bitcoin address and are asking people to try and break in. Hence, they’ve called it the Bitcoin Piñata!*
The Piñata unikernel will transmit its private bitcoin key if you can successfully set up a TLS connection but it’s rigged so that it will only create that connection if you can present the certificate it’s expecting to see — which has been signed appropriately. Of course, you’re not being given the secret key with which to do that signing and that means there should be no way for anyone to form a TLS connection with the Piñata. In order to get the private key to the bitcoin address, you’ll have to smash your way in.
Helpfully (perhaps), things are set up so that you can make the Piñata talk to itself, allowing you to eavesdrop on a successful connection and see the encrypted traffic. In addition, all the code and libraries are open-source so you can look through any of the codebase. There isn’t anything that anyone will have to reverse engineer, which should make this a little more enjoyable.
This contest is set to run until mid-March or whenever the coins are taken. If someone does manage to get in, please do let us know how!
Of course there are many other ways to get at the private key and as many people like to comment, the human element is sometimes the weakest link — after all, a safe is only as secure as the person with the combination.
In this case, there is obviously a secret key or certificate somewhere that could be presented so it may be tempting to go hunting for that. Perhaps phishing attempts on the authors may yield a way forward, or maybe just straight-forward Rubber-hose cryptanalysis! Sure, these options might provide a result† but this is meant to be fun. The authors haven’t specified any rules but please be nice and focus on the technical things around the Piñata‡. Don’t be this guy.
Even though the Bitcoin Piñata is clearly a contest, nobody is deluding themselves into thinking that if the coins are still there in March, that somehow the stack can be declared ‘undefeated’ — while pleasing, that result wouldn’t necessarily prove anything. Contests have their place but as Bruce Schneier already pointed out, they are not useful mechanisms to judge security.
However, it does give us the chance to engage in some shameless self-promotion and try to draw vast amounts of attention to the work. That, and the chance to stress-test the stack in the wild. Ultimately, we want to use this code in production but must take a lot of care to get there and want to be sure that it can bear up. This is just one more way of learning what happens when putting something ‘real’ out there.
If the Bitcoins do end up being taken, then there’s definitely something valuable that the team can learn from that. Regardless of the Piñata, if we have more people exploring the TLS codebase or trying it out for themselves, it will undoubtedly be A Good Thing.
For clarity and to avoid any doubt, please be aware that the TLS codebase is missing external code audits and is not yet intended for use in any security critical applications. All development is done in the open, including the tracking of security-related issues, so please do consider auditing the code, testing it in your services and reporting issues.
* If you've never come across a piñata before, hopefully the gif in the post gives you an idea. If not, the wiki page will surely help, where I learned that the origin may be Chinese rather than Spanish!
† Of course, I'm not suggesting that anyone would actually go this far. I'm simply acknowledging that there is a human factor and asking that we put it aside.
‡ Edit to add: After seeing Andrea's tweet I should point out that any part of MirageOS, including the networking stack, OCaml runtime etc is a legitimate vector. It's why there's a manifest of the libraries that have been used to build the Piñata!Share / Comment