Amir Chaudhry

thoughts, comments & general ramblings

A logo for unikernels!

I’ve been working towards a logo for unikernels and it’s time to gather feedback from the unikernel community! There’s been a 99designs contest running for a while — where designers have submitted just over 400 entries — and a short-list for community feedback is now at https://99designs.co.uk/logo-design/vote-99gjyi (link open until at least 15th Jan).

At this stage, the aim is to gather wider feedback to inform the next step. The final design will be selected at a later date — after further iteration with designers. Once a logo is chosen, it will be used on unikernel.org and will help inform the design of the rest of the site (with input from a professional firm).

For more background you can read the design brief and you can even look through all the entries.

The unikernel approach as a concept has been getting much more attention recently and this will only increase throughout 2016.

Since there are a number of different projects — each approaching the problem from a different perspective — it will become even more important to ease the process of bringing in new users and contributors across all the projects.

The website at unikernel.org is already helping with this but we will also need a simple visual identity to go alongside it. Building this kind of brand will greatly improve the standing of unikernels as a whole and will be just as important as the work going on in each of the implementations. Ultimately, this will help to increase the rate of adoption.

What about projects’ existing logos?

To avoid any confusion, I’ll say upfront that this logo is not intended to replace any project’s existing brand! It’s meant to help us represent unikernels as a whole and each project will make it own choices, just as they do currently. The logo will mainly be used on unikernel.org and projects can make use of it if they wish.

Projects have already benefited from shared code and the same will be true from this shared brand.

What now?

Share / Comment


CodeMesh 2015

These are the slides from my talk today at CodeMesh. This time around I was earlier in the schedule so I get to enjoy the rest of the conference! If you’re reading this at the conference now, please do follow the link in my talk to rate it and give me feedback!

The specific items I reference in the talk are below with links to more information.

Security and the Bitcoin Piñata

This is a bounty where we have locked away some bitcoin in a unikernel that is running our new TLS stack. This was a new model of running a bounty and has proven a great way to stress test the code in the wild.

You can follow up with more of the background work on the TLS stack by looking at the paper, “Not-quite-so-broken TLS: lessons in re-engineering a security protocol specification and implementation” and find other users of the libraries via https://nqsb.io.

Automated deployment

I’ve previously written about how we do unikernel deployments for MirageOS. Although the scripts themselves have evolved and become more sophisticated, these are still a good introduction.

Summoning on demand

The work on summoning unikernels was presented at Usenix this year and you can read the paper, “Jitsu: Just-In-Time Summoning of Unikernels”. The example I showed in the talk can be found at http://www.jitsu.v0.no.

Other resources

To get involved in the development work, please do join the MirageOS devel list and try out some of the examples for yourselves!

Share / Comment


Governance of OCaml.org

Governance Screenshot

For several months, I’ve been working with the maintainers of OCaml.org projects to define and document the governance structure around the domain name. I wrote about this previously and I’m pleased to say that the work for this phase has concluded, with the document now live.

Recurring themes

There were some recurring themes that cropped up during my email discussions with people and I thought it would be useful to present a summary of them, along with my thoughts. Broadly, the discussions revolved around the philosophy of the document, the extent of its scope, and the depth of coverage. This discourse was very important for refining and improving the document.

Ideals and Reality

Some of the comments I received were essentially that the document did not represent how we should be organising ourselves. There was occasionally the sense (to me at least) that the only appropriate form of governance is a fully democratic and representational one.

That would entail things like official committees, ensuring that various communities/organisations were represented, and perhaps establishing some form of electoral processes. Overall, something relatively formal and quite carefully structured. Of course, instituting such an arrangement would necessarily require somewhat involved procedures, documentation, and systems — as well as the volunteer time to manage those processes.

These may be noble aims — and I expect one day we’ll be closer to such ideals — but one of the critical factors for the current approach was that we record how things are right now. In my experience, anything else is purely aspirational and therefore would have little bearing with how things currently function.

To put it another way, the current document must not describe the structure we desire to have, but the organisation we actually have — warts and all. Yes, right now we have a BDFL*, who personally owns the domain and therefore can do as he pleases with it. Irrespective of this, the community has been able to come together, coordinate themselves, and build very useful things around the domain name. This has happened independently of any formal community processes and, in my view, has largely been driven by people supporting each other’s works and generally trying to ‘do the right thing’.

Another aspect to point out is that is that such documents and procedures are not necessary for success. This is obvious when you consider how far the OCaml community has come in such a relatively short space of time. Given this, one might argue why we need any kind of written governance at all.

To answer that, I would say that once things grow beyond a certain scale, I believe it helps to gather the implicit behaviours and document them clearly. This allows us to be more systematic in our approach and also enables newcomers to understand how things work and become involved more quickly. In addition, having a clear record of how things operate in the present is an invaluable tool in helping to clarify what exactly we should work on changing for the future.

Extent of scope

It’s a little confusing to consider that ‘OCaml.org’ is simultaneously a collection of websites, infrastructural components, and projects. Disambiguating these from the wider OCaml community was important, and relatively straightforward, but there were a few questions about the relationship between the domain name and the projects that use it.

Although the governance covers the OCaml.org domain name, it necessarily has an impact on the projects which make use of it. This matters because anything under the OCaml.org domain will, understandably, be taken as authoritative by users at large. In a way, OCaml.org becomes the sum of the projects under it, hence it’s necessary to have some lightweight stipulations about what is expected of those projects.

Projects themselves are free to organise as they wish (BDFL/Democracy/etc) but there are certain guiding principles for OCaml.org that those projects are expected to be compatible with (e.g. openness, community-related, comms, etc). These stipulations are already met by the current projects, so codifying them is intended to clarify expectations for new projects.

Depth of coverage

Another of the recurring points was how the current document didn’t capture every eventuality. Although I could have attempted this, the end result would have been a lengthy document, full of legalese, that I expect very few people would ever read. The document would also have needed to cover eventualities that have not occurred (yet) and/or may be very unlikely to occur.

Of course, this is not a legal document. No-one can be compelled to comply with it and there are very few sanctions for anyone who chooses not to comply. However, for those who’ve agreed to it, acceptance signals a clear intent to take part in a social contract with the others involved in work around the domain name.

Overall, I opted for a lightweight approach that would cover how we typically deal with issues and result in a more readable document. Areas that are ‘unchartered’ for us should be dealt with as they have been so far — through discussion and action — and can subsequently be incorporated when we have a better understanding of the issues and solutions.

A solid starting position

The current version of the governance document is now live and it is very much intended to be a living document, representing where we are now. As the community continues to grow and evolve, we should revisit this to ensure it is accurate and is meeting our needs.

I look forward to seeing where the community takes it!

In case you’re interested, the set of links below covers the journey from beginning to end of this process.

* Yeah, I made sure to add Xavier to the BDFL list before publishing this. :)

Thanks to Ashish, Philippe and Anil for comments on an earlier draft.

Share / Comment


Unikernels at PolyConf!

Updated: 14 July (see below)

Above are my slides from a talk at PolyConf this year. I was originally going to talk about the MISO tool stack and personal clouds (i.e. how we’ll build towards Nymote) but after some informal conversations with other speakers and attendees, I thought it would be way more useful to focus the talk on unikernels themselves — specifically, the ‘M’ in MISO. As a result, I ended up completely rewriting all my slides! Since I pushed this post just before my talk, I hope that I’m able to stick to the 30min time slot (I’ll find out very soon).

In the slides I mention a number of things we’ve done with MirageOS so I thought it would be useful to list them here. If you’re reading this at the conference now, please do give me feedback at the end of my talk!

To get involved in the development work, please do join the MirageOS devel list and try out some of the examples for yourselves!

Update — 14 July

The video of the talk is now available and it’s embedded below. Overall, the talk seemed to go well and there was enough time for questions.

At the end of the talk, I asked people to give me feedback and shared a URL, where I had a very short form. I had 21 responses with a rating of 4.52/5.00. I’m quite pleased with this and the feedback was also useful. In a nutshell, the audience seemed to really appreciate the walkthrough (which encourages me to make some screencasts). There was one comment that I didn’t do enough justice to the security benefits. Specifically, I could have drawn more reference to the OCaml TLS work, which prevents bugs like heartbleed. Security is definitely one of the key benefits of MirageOS unikernels (see here), so I’ll do more to emphasise that next time.

Here’s the video and I should mention that the slides seem to be a few seconds ahead. You’ll notice that I’ve left the feedback link live, too. If you’d like to tell me what you think of the talk, please do so! There are some additional comments at the end of this post.

Finally, here are few things I should clarify:

  • Security is one of the critical benefits, which is why we need new systems for personal clouds (rather than legacy stacks).
  • We still get to use all the existing tools for storage (e.g. EBS), it doesn’t have to be Irmin.
  • The Introducing Irmin post is the one I was trying to point an audience member at.
  • When I mention the DNS server, I said it was 200MB when I actually meant 200KB. More info in the MirageOS ASPLOS paper.
  • I referred to the “HAT Project” and you should also check out the “Databox paper”.
  • A summary of other unikernel approaches is also available.
Share / Comment


Towards Heroku for Unikernels: Part 2 - Self Scaling Systems

In the previous post I described the continuous end-to-end system that we’ve set up for some of the MirageOS projects — automatically going from a git push all the way to live deployment, with everything under version-control.

Everything I described previously already exists and you can set up the workflow for yourself, the same way many others have done with the Travis CI scripts for testing/build. However, there are a range of exciting possibilities to consider if we’re willing to extrapolate just a little from the tools we have right now. The rest of this post explores these ideas and considers how we might extend our system.

Previously, we had finished the backbone of the workflow and I discussed a few ideas about how we should flesh it out — namely more testing and some form of logging/reporting. There’s substantially more we could do when we consider how lean and nimble unikernels are, especially if we speculate about the systems we could create as our toolstack matures. A couple of things immediately come to mind.

The first is the ability to boot a unikernel only when it is required, which opens up the possibility of highly-elastic infrastructure. The second is the ease with which we can push, pull or otherwise distribute unikernels throughout a system, allowing new forms of deployment to both cloud and embedded systems. We’ll consider these in turn and see where they take us, comparing with the current ‘mirage-decks’ deployment I described in Part 1.

Demand-driven clouds

The way cloud services are currently provisioned means that you may have services operating and consuming resources (CPU, memory, etc), even when there is no demand for them. It would be significantly more efficient if we could just activate a service when required and then shut it down again when the demand has passed. In our case, this would mean that when a unikernel is ‘deployed to production’, it doesn’t actually have to be live — it merely needs to be ready to boot when demand arises. With tools like Jitsu (Just-In-Time Summoning of Unikernels), we can work towards this kind of architecture.

Summon when required

Jitsu allows us to have unikernels sitting in storage then ‘summon’ them into existence. This can occur in response to an incoming request and with no discernible latency for the requester. While unikernels are inactive, they consume only the actual physical storage required and thus do not take up any CPU cycles, nor RAM, etc. This means that more can be achieved with fewer resources and it would significantly improve things like utilization rates of hardware and power efficiency.

In the case of the decks.openmirage.org unikernel that I discussed last time, it would mean that the site would only come online if someone had requested it and would shut down again afterwards.

In fact, we’ve already been working on this kind of system and Jitsu will be presented at NSDI in Oakland, California this May. In the spirit of looking ahead, there’s more we could do.

Hyper-elastic scaling

At the moment, Jitsu lets you set up a system where unikernels will boot in response to incoming requests. This is already pretty cool but we could take this a step further. If we can boot unikernels on demand, then we could use that to build a system which can automate the scale-out of those services to match demand. We could even have that system work across multiple machines, not just one host. So how would all this look in practice for ‘mirage-decks’?

Auto-scaling and dispersing our slide decks

Our previous toolchain automatically boots the new unikernel as soon as it is pulled from the git repo. Using Jitsu, our deployment machine would pull the unikernel but leave it in the repo — it would only be activated when someone requests access to it. Most of the time, it may receive no traffic and therefore would remain ‘turned off’ (let’s ignore webcrawlers for now). When someone requests to see a slide deck, the unikernel would be booted and respond to the request. In time it can be turned off again, thus freeing resources. So far, so good.

Now let’s say that a certain slide deck becomes really popular (e.g. posted to HackerNews or Reddit). Suddenly, there are many incoming requests and we want to be able to serve them all. We can use the one unikernel, on one machine, until it is unable to handle the load efficiently. At this point, the system can create new copies of that unikernel and automatically balance across them. These unikernels don’t need to be on the same host and we should be able to spin them up on different machines.

To stretch this further, we can imagine coordinating the creation of those new unikernels nearer the source of that demand, for example starting off on a European cloud, then spinning up on the East coast US and finally over to the West coast of the US. All this could happen seamlessly and the process can continue until the demand passes or we reach a predefined limit — after all, given that we pay for the machines, we don’t really want to turn a Denial of Service into a Denial of Credit.

After the peak, the system can automatically scale back down to being largely dormant — ready to react when the next wave of interest occurs.

Can we actually do this?

If you think this is somewhat fanciful, that’s perfectly understandable — as I mentioned previously, this post is very much about extrapolating from where the tools are right now. However, unikernels actually make it very easy to run quick experiments which indicate that we could iterate towards what I’ve described.

A recent and somewhat extreme experiment ran a unikernel VM for each URL. Every URL on a small static site was served from its own, self-contained unikernel, complete with it’s own web server (even the ‘rss.png’ icon was served separately). You can read the post to see how this was done and it also led to an interesting discussion on the mailing list (e.g. if you’re only serving a single item, why use a web server at all?). Of course, this was just an experiment but it demonstrates what is possible now and how we can iterate, uncover new problems, and move forward. One such question is how to automatically handle networking during a scale-out, and this is an area were tools like Signpost can be of use.

Overall, the model I’ve described is quite different to the way we currently use the cloud, where the overhead of a classic OS is constantly consuming resources. Although it’s tempting to stick with the same frame of reference we have today we should recognise that the current model is inextricably intertwined with the traditional software stacks themselves. Unikernels allow completely new ways of creating, distributing and managing software and it takes some thought in order to fully exploit their benefits.

For example, having a demand-driven system means we can deliver more services from just the one set of physical hardware — because not all those services would be consuming resources at the same time. There would also be a dramatic impact on the economics, as billing cycles are currently measured in hours, whereas unikernels may only be active for seconds at a time. In addition to these benefits, there are interesting possibilities in how such scale-outs can be coordinated across different devices.

Hybrid deployments

As we move to a world with more connected devices, the software and services we create will have to operate across both the cloud and embedded systems. There have been many names for this kind of distributed system, ranging from ubiquitous computing to dust clouds and the ‘Internet of Things’ but they all share the same idea of running software at the edges of the network (rather than just cloud deployments).

When we consider the toolchain we already have, it’s not much of a stretch to imagine that we could also build and store a unikernel for ARM-based deployments. Those unikernels can be deployed onto embedded devices and currently we target the Cubieboard2.

We could make such a system smarter. Instead of having the edge devices constantly polling for updates, our deployment process could directly push the new unikernels out to them. Since these devices are likely to be behind NATs and firewalls, tools like Signpost could deal with the issue of secure connectivity. In this way, the centralized deployment process remains as a coordination point, whereas most of the workload is dealt with by the devices the unikernels are running on. If a central machine happens to be unavailable for any reason, the edge-devices would continue to function as normal. This kind of arrangement would be ideal for Internet-of-Things style deployments, where it could reduce the burden on centralised infrastructure while still enabling continuous deployment.

In this scenario, we could serve the traffic for ‘mirage-decks’ from a unikernel on a Cubieboard2, which could further minimise the cost of running such infrastructure. It could be configured such that if demand begins to peak, then an automated scale-out can occur from the Cubieboard2 directly out onto the public cloud and/or other Cubieboards. Thus, we can still make use of third-party resources but only when needed and of the kind we desire. Of course, running a highly distributed system leads to other needs.

Remember all the things

When running services at scale it becomes important to track the activity and understand what is taking place in the system. In practice, this means logging the activity of the unikernels, such as when and where they were created and how they perform. This becomes even more complex for a distributed system.

If we also consider the logging needs of a highly-elastic system, then another problem emerges. Although scaling up a system is straightforward to conceptualise, scaling it back down again presents new challenges. Consider all the additional logs and data that have been created during a scale-out — all of that history needs to be merged back together as the system contracts. To do that properly, we need tools designed to manage distributed data structures, with a consistent notion of merges.

Irmin addresses these kinds of needs and it enables a style of programming very similar to the Git workflow, where distributed nodes fork, fetch, merge and push data between each other. Building an end-to-end logging system with Irmin would enable data to be managed and merged across different nodes and keep track of activity, especially in the case of a scale down. The ability to capture such information also means the opportunity to provide analytics to the creators of those unikernels around performance and usage characteristics.

The use of Irmin wouldn’t be limited to logging as the unikernels themselves could use it for managing data in lieu of other file systems. I’ll refrain from extrapolating too far about this particular tool as it’s still under rapid development and we’ll write more as it matures.

On immutable infrastructure

You may have noticed that one of the benefits of the unikernel approach arises because the artefacts themselves are not altered once they’re created. This is in line with the recent resurgence of ideas around ‘immutable infrastructure’. Although there isn’t a precise definition of this, the approach is that machines are treated as replaceable and can be regularly re provisioned with a known state. Various tools help the existing systems to achieve this but in the case of unikernels, everything is already under version control, which makes managing a deployment much easier.

As our approach is already compatible with such ideas, we can take it a step further. Immutable infrastructure essentially means the artefact produced doesn’t matter. It’s disposable because we have the means to easily recreate it. In our current example, we still ship the unikernel around. In order to make this ‘fully immutable’, we’d have to know the state of all the packages and code used when building the unikernel. That would give us a complete manifest of which package versions were pulled in and from which sources. Complete information like this would allow us to recreate any given unikernel in a highly systematic way. If we can achieve this, then it’s the manifest which generates everything else that follows.

In this world-view, the unikernel itself becomes something akin to caching. You use it because you don’t want to rebuild it from source — even though unikernels are quicker to build than a whole OS/App stack. For more security critical applications, you may want to be assured of the code that is pulled in, so you examine the manifest file before rebuilding for yourself. This also allows you to pin to specific versions of libraries so that you can explicitly adjust the dependencies as you wish. So how do we encode the manifest? This is another area where Irmin can help as it can keep track of the state of package history and can recreate the environment that existed for any given build run. That build run can then be recreated elsewhere without having to manually specify package versions.

There’s a lot more to consider here as this kind of approach opens up new avenues to explore. For the time being, we can recognise that the unikernel approach lends itself to the achieving immutable infrastructure.

What happens next?

As I mentioned at the beginning of this post, most of what I’ve described is speculative. I’ve deliberately extrapolated from where the tools are now so as to provoke more thoughts and discussion about how this new model can be used in the wild. Some of the things we’re already working towards but there are many other uses that may surprise us — we won’t know until we get there and experimenting is half the fun.

We’ll keep marching on with more libraries, better tooling and improving quality. What happens with unikernels in the rest of 2015 is largely up to the wider ecosystem.

That means you.

Edit: discuss this post on devel.unikernel.org


Thanks to Thomas Gazagnaire and Richard Mortier for comments on an earlier draft.

Share / Comment