I'm speaking to the Entrepreneur First cohort this morning about the future of resilient, distributed systems and what I'm working on to get us there. Firstly, I'm describing the kinds of solutions we have today, the great things they offer developers as well as the issues they create. This leads into the new toolstack we're creating, called the MISO stack, and the benefits and trade-offs.
I'm spending more time talking about Mirage OS -- the 'M' in the MISO stack -- because the workflow we've developed here underpins how we build, deploy and maintain such applications at scale. As an example of how things can work, I point at my earlier post on how to go from jekyll to unikernel. This uses TravisCI to do all the hard work and all the relevant artefacts, including the final VM, can be version-controlled through Git. I actually deployed this post while the audience was watching, so that I could point at the build logs.
One of the use cases for our toolstack is to make it possible for individuals to create and maintain their own piece of the cloud, a project called Nymote, which will also make it possible to run the Internet of my Things -- which itself is related to another things I'm working on, such the Hub of All Things - HAT and the User Centric Networking projects.
This is an exciting summer for all the tools we're putting together, since we've recently announced Mirage OS v2.0, which now works on ARM, are going full steam ahead with Irmin and working hard on improvements to the OCaml ecosystem. It's a great time to explore these projects, learn a new language and build awesome stuff.Share / Comment
A couple of weeks ago, I took part in Seedhack 5.0, on the theme of life-logging. My team were overall winners with Clarity, our calendar assistant app. This post captures my experiences of what happened over the weekend, the process of how we built the app and the main things I learned. You'll find out what Clarity is at the end -- just like we did.
The weekend began with some information on the APIs available to us, which was followed by pizza and mingling with everyone. I spoke to a few people about what they were working on and the technologies they were used to. It was good to find a mixture of experience and I was specifically looking for folks with an interest in functional programming -- that's how I first met Vlad over Twitter.
After pizza, those people with ideas, even if not fully formed, were invited to share them with the room. I came in to Seedhack with specific thoughts on the kind of things I wanted to work on so I spoke about one of those.
I described the problem of silos, poor interoperability and how all the life-logging data should really be owned by the user. That would allow third parties to request access and provide way more value to users, while maintaining privacy and security. Building a centralised service makes a lot of sense in the first instance but what's more disruptive than eschewing the current model of yet-another-silo and putting the user in control? If that sounds familiar, it's because I'm trying to solve these problems already.
I'm working on a open source toolstack for building distributed systems that I call the MISO stack, which is analogous to the LAMP stack of old but is based on Mirage OS. With this stack, I'm putting together a system to help people create and run their own little piece of the cloud -- Nymote. The introductory post and my post on 'The Internet of my Things' has more detail on why I'm working on this.
For systems like this to be viable, we must be willing to trust them with the core applications of Email, Contacts and Calendar. Without advanced and robust options for running these, it's unlikely that anyone (including me) would want to switch away from the current providers. Of these three applications, I decided to talk about the contact management solution, since I happen to have wireframes and thought it might be simpler to implement something over the weekend.
There was quite a bit of interest in the overall concept but what really piqued my curiosity was that someone else presented some thoughts around Calendars and analytics. After a brief chat, we decided to join forces and tackle the problems of Calendar management. The team had a great mix of experience from product to design and several of them had worked together before.
Amir - Product - Worked in several startups, product & programme management experience, currently a Post Doc at Cambridge University Computer Science dept.
River - Product - Programme Manager at Dotforge Accelerator and lead organiser of StartupBus UK 2014.
Mani - Developer - Freelance web dev (CMS and APIs), winner of multiple hackathons, currently studying at Sheffield University.
Vlad - Developer - Started programming long ago and attended many competitions and hackathons. Currently studying Computer Science at the University of Southampton.
João - Developer - PhD in Theoretical Physics, Python enthusiast and moving into data science, currently doing data analysis at Potential.
Jeremy - Designer - Freelance UI/UX Designer, hackathon enthusiast, currently studying medicine at Sheffield University.
We had all decided to work together and we knew it would be on the problem of calendar management and analytics. We were fired up but it quickly became obvious that was all we knew.
The next four to five hours were spent discussing the rough shape of what we were going to build, what specific problem we thought we were solving and whether there were enough people with such a problem to care.
We had a look at each other's calendars and talked about how we each use them and the things we like and dislike about them. For example, I have around nine calendars and I curate them carefully, adding contextual information and sometimes even correcting old events to reflect what happened. We even bounced around the idea of the contact management app several times as well as a few other ideas that came up during the discussions.
These conversations took a while and it seemed like we were going around in circles. Despite this, it didn't feel particularly frustrating. I realised that the same sticking point was coming up repeatedly because we were forcing ourselves to imagine a prototypical customer and the problems they might have. This was never going to work since we wouldn't have time to go and find such people and do basic customer development. Far better to constrain the problem to something we experience so that we can look to ourselves for initial customer feedback. Once we did this, things seemed to go a little faster and taking breaks for food helped us keep our energy up.
There were a few occasions where I looked around the room and saw other teams with their heads down, headphones in, and bashing away at keyboards -- we hadn't even figured out what we were doing yet. Despite this, it was a great exercise because it allowed all of us to get a feel for what aspects each of us cared about most and it helped us form some kind of shared language for the product.
The only outcome from this first evening was an outline but it was an important one. It distilled what we what we were going to work on and the components of it. We did this so we'd have a clear starting point the next morning and could get going quickly. Here's a paraphrased version of what we sent ourselves.
Smart Calendar App. We are collecting data from mobile and from desktop. - Mobile includes: - Call logs - Location (if we can) - Messages - What application is being used and when - Desktop includes: - What application is active and timestamps of it - Location? - Git Logs ... ? - Taking logs of their existing calendars. Working out what people are doing. - Extrapolating info from active application (e.g browser page) Present info back via calendar UI - Need to turn all this information into webcal events Useful info we want - Time spent travelling (how much?) - Time in meetings - Time on phone - Who the meetings/calls were with - Relevant docs/emails these are linked with - Use labels
Then it was time for some late night snacks.
With one challenge out of the way, the next one was finding somewhere to sleep for a few hours (Campus was closing from 01:30). I had nothing planned but luckily for me, a couple of team members had booked a hotel room for the night. The minor complication was that we had to first find the hotel and then somehow get six people into a room meant for two -- without the night manager kicking us out. That's a whole other story, but it suffices to say that James Bond has nothing to worry about.
After some card games and a few hours of sleep, we headed back to Campus and during this walk, we came up with 'Clarity' as the name of the application.
Once we arrived, development began. River volunteered his digital assets to the cause (i.e. his whole Google life). Mani worked on the Android app, Vlad on a Chrome extension with Joao pulling in the Google Apps data as well as combining it with data from the various platform apps. Jeremy worked on the front-end of the site, while River and I began wireframing the UI and user flow through the site.
Once the development was well underway, I realised how superfluous 'the business guys' can be. It would have been easy to simply sit there and let everyone get on with it but there were other things River and I did while developers were writing code.
Wireframing - We spent time thinking about what a user would actually see and engage with once they visited the Clarity site. We made a lot of sketches on paper and this was helpful because communication with the team was smoother with something to guide the discussion. It also helped to inform the design work and gave River and I something to show to potential users.
Talk to people - aka early customer development. We already knew that we were our own customers but it was useful to talk to other people for two reasons. Firstly, to get an idea of how they use their calendars and whether they have similar problems to us and secondly, to see what thoughts we prompt when we describe our solution (or show our wireframes). This led to useful information on how we should refine the product and and position ourselves against perceived competition.
Refine the product - Going through the wireframing and talking to people helped us come up with several new ideas for how to display the data back to users. Some of these seemed great at the time, but after showing some paper sketches to other people, we realised customers didn't care about certain things, so we discarded them. Even though Jeremy had already done the work of putting together the UI for them (sorry, Jeremy!).
Examine the competition - After we described what we were working on. A few people mentioned potential competitors and asked how we were different. Initially, we didn't know much about these companies but it was something we could explore while development was underway and consider our positioning.
Remind people to regroup - Every few hours, we would make sure everyone caught up with each other. We would check that things were going well, share what we'd learned from talking to people and discuss any technical problems and possible workarounds -- including changing the scope of the product. The discussions we'd had on Friday meant that we spent less time debating when these questions came up during the weekend.
Work on the pitch - River and I began working on the pitch from just after Saturday lunchtime and kept building on it until Sunday afternoon. This, combined with showing our sketches to people, made it much easier to think about the story we wanted to tell the audience. In turn, that made it easier to think about the product development that had to be completed by Sunday. Especially in terms of a kick-ass demo.
Development carried on through the night and we took a break to watch some sports via River's laptop -- at this point Clarity was actually logging this and other events. The next morning, we reiterated what we needed to get done for the demo and I was pretty ruthless about practising the pitch. River and I practised endlessly while everyone else made sure the the technical pieces were working smoothly. We had a lot of moving parts and making sure they were glued together seamlessly was important. At this point, we knew what Clarity was and how to tell its story.
We all have calendars and we put a lot of time and effort into managing them but get very little back. A simple glance at your calendar for the past month will show you a sea of information with no idea where your time was actually spent. We believe your calendar should be working harder for you. Your calendar should give you clarity.
Over the course of the weekend, we built tools that can go through your calendar and understand the events you're involved in and tie them back to the relevant emails, documents and people. With a suite of software that spans across your GDrive, Chrome and Android, we're able to combine your calendars with rich, contextual information so you can really understand what your time is being spent on.
We built this system and plugged it into River's digital life. If we take a look at River's summary for the last month, we see that he's spent around 32 hours in meetings in London last month, despite living in Sheffield. We can also see that he's spent an hour on the phone with someone called Lee, but that all of them were short calls. The next person also totalled an hour on the phone but only across 3 calls. Already, River has learned something about the the people he interacts with most and how. We can also drill down further and see all this activity presented in a calendar view, except this now represents where his time did go, rather than where he thought it went. For example, he's most active via text message between 4pm and 5pm during the week, and we can see that he spent a few hours watching sports last night.
Clarity can do much more than provide an accurate retrospective view of your time. Since it interacts with all the important components of your life, like your phone and laptop, it can even perform helpful actions for you. For example, say I have a meeting set up with River but I want to reschedule it. I simply send a text to him as I normally would, suggesting that we move it to another day. Clarity can pick up that message and is smart enough to understand its intent, find the relevant calendar event and reschedule it automatically. River doesn't have to lift a finger and his diary is always up to date. It's easy to imagine a future where we might never have to add or edit events ourselves.
Clarity is a smart calendar assistant that understands the context around you, provides you with insight into your life and helps you seamlessly organise your future. You can find out more at http://clarityapp.me
After the pitches, we met a lot of people who had the same problems as we did with calendar management. Several offered to be beta testers. After waiting for the judges, prizes were announced and Clarity was declared the overall winner of Seedhack 5.0!
Looking back, there were a lot of things we did which I think helped us get to the winning slot, so I thought I'd summarise them here.
Think first - We spent time up front to define what we were going to work on. I thought this step was crucial as it meant we all understood the shape of the problem but also the areas that each of us were interested in. That helped later in the weekend as we could refer back to things we discussed on Friday.
Move fast - Once we did figure out what we were doing, then it was a matter of building the software to gather and crunch the data. A lot of this was done in parallel as Jeremy worked on the front-end UI while Mani, Joao and Vlad took care of the data aggregation, analytics and platform products. Don't be afraid to throw away ideas if you find that they don't work for people and remember it's a hackathon (i.e gruesome hacks are the norm).
Remember the demo - At some point you're going to be forced to stand up and talk about what you've done. We started thinking about this from Saturday lunchtime and sketched out the slides and the elements of the app we wanted to show. This helped inform the UI and technology that we were building and the pitch never felt like it was rushed.
Practice, practice, practice - we were told we'd have 3 minutes to pitch/demo and maybe a few additional questions. I made sure River and I practised repeatedly to get our time down to 3mins and ensure we getting across everything we wanted to. The important thing with such a short amount of time, is that we were forced to cut things out as well as ensure we emphasised the main points. It was a ruthless exercise in saying 'no'. During the actual pitches, we realised everyone was taking longer (without repercussions), so I added an extra 30 seconds to cover the potential market sizes and business models.
Leave artefacts - After the pitches, we knew that we had to take the site down. It was built at high speed and in a way that ended up exposing someone's data to the internet at large (thanks, River!). It might have been somewhat easier to build something using faked data that we could happily share the URL to and leave online. On the other hand, our demo would not have been half as compelling if we weren't running it real-time on live data.
This is a product we all want to use and the team is interested in taking this forward. There are a lot of things to think about and many things we would build differently so we're discussing the next steps. For example, there are likely ways to empower the end-users to control their data and give them more flexibility, even though the work at the hackathon was already quite impressive. Given how well Seedhack went, you might even see us at Seedcamp Week later in the year. If you think this is something you'd like to work on with us, do get in touch!
To wrap things up, here's a victory selfie!Share / Comment
I've been learning OCaml for some time now but not really had a problem that I wanted to solve. As such, my progress has been rather slow and sporadic and I only make time for exercises when I'm travelling. In order to focus my learning, I have to identify and tackle something specific. That's usually the best way to advance and I recently found something I can work on.
As I've been trying to write more blog posts, I want to be able to keep as much content on my own site as possible and syndicate my posts out to other sites I run. Put simply, I want to be able to take multiple feeds from different sources and merge them into one feed, which will be served from some other site. In addition, I also want to render that feed as HTML on a webpage. All of this has to remain within the OCaml toolchain so it can be used as part of Mirage (i.e. I can use it when building unikernels).
What I'm describing might sound familiar and there's a well-known tool that does this called Planet. It's a 'river of news' feed reader, which aggregates feeds and can display posts on webpages and you can find the original Planet and it's successor Venus, both written in Python. However, Venus seems to be unmaintained as there are a number of unresolved issues and pull requests, which have been languishing for quite some time with no discussion. There does appear to be a more active Ruby implementation called Pluto, with recent commits and no reported issues.
Although I could use the one of the above options, it would be much more useful to keep everything within the OCaml ecosystem. This way I can make the best use of the unikernel approach with Mirage (i.e lean, single-purpose appliances). Obviously, the existing options don't lend themselves to this approach and there are known bugs as a lot has changed on the web since Planet Venus (e.g the adoption of HTML5). Having said that, I can learn a lot from the existing implementations and I'm glad I'm not embarking into completely uncharted territory.
In addition, the OCaml version doesn't need to (and shouldn't) be written as one monolithic library. Instead, pulling together a collection of smaller, reusable libraries that present clear interfaces to each other would make things much more maintainable. This would bring substantially greater benefits to everyone and OPAM can manage the dependencies.
The first cut is somewhat straightforward as we have a piece that deals with the consumption and manipulation of feeds and another that takes the result and emits HTML. This is also how the original Planet is put together, with a library called feedparser and another for templating pages.
For the feed-parsing aspect, I can break it down further by considering Atom and RSS feeds separately and then even further by thinking about how to (1) consume such feeds and (2) output them. Then there is the HTML component, where it may be necessary to consider existing representations of HTML. These are not new ideas and since I'm claiming that individual pieces might be useful then it's worth finding out which ones are already available.
The easiest way to find existing libraries is via the
OPAM package list. Some quick searches for
net bring up a lot of packages. The most relevant of these seem to be
xmlm, ocamlrss, cow and maybe xmldiff. I noticed that
nothing appears, when searching for
Atom, but I do know that
cow has an
Atom module for creating feeds. In terms of turning feeds into pages and
HTML, I'm aware of rss2html used on the OCaml website and parts of
ocamlnet that may be relevant (e.g
netstring) as well as
cow. There is likely to be other code I'm missing but this is useful as a
Overall, a number of components are already out there but it's not obvious if they're compatible (e.g html) and there are still gaps (e.g atom). Since I also want to minimise dependencies, I'll try and use whatever works but may ultimately have to roll my own. Either way, I can learn from what already exists. Perhaps I'm being overconfident but if I can break things down sensibly and keep the scope constrained then this should be an achievable project.
As this is an exercise for me to learn OCaml by solving a problem, I need to break it down into bite-size pieces and take each one at a time. Practically speaking, this means limiting the scope to be as narrow as possible while still producing a useful result for me. That last part is important as I have specific needs and it's likely that the first thing I make won't be particularly interesting for many others.
For my specific use-case, I'm only interested in dealing with Atom feeds as that's what I use on my site and others I'm involved with. Initial feedback is that creating an Atom parser will be the bulk of the work and I should start by defining the types. To keep this manageable, I'm only going to deal with my own feeds instead of attempting a fully compliant parser (in other words, I'll only consider the subset of RFC4287 that's relevant to me). Once I can parse, merge and write such feeds I should be able to iterate from there.
To make my requirements more concrete:
I've honestly no idea how long this might take and I'm treating it as a side-project. I know there are many people out there who could produce a working version of everything in a week or two but I'm not one of them (yet). There are also a lot of ancillary things I need to learn on the way, like packaging, improving my knowledge of Git and dealing with build systems. If I had to put a vague time frame on this, I'd be thinking in months rather than weeks. It might even be the case that others start work on parts of this and ship things sooner but that's great as I'll probably be able to use whatever they create and move further along the chain.
In terms of workflow, everything will be done in the open, warts and all, and I expect to make embarrassing mistakes as I go. You can follow along on my freshly created OCaml Atom repo, and I'll be using the issue tracker as the main way of dealing with bugs and features. Let the fun begin.
I was at an event run by Cambridge Wireless today on the Internet of Things. These are my notes. Although I've named people, I'm paraphrasing a lot so best not to treat anything you read here as a direct quote.
There were a few audience polls conducted using little keypads they handed out. There were only 30-ish keypads so it's not necessarily indicative of how the whole room felt. I took some hasty pics of a few of the slides (mainly the poll results) so apologies for the poor quality.
John Hicklin, Principal Consultant | Commercial Enterprise Markets, CGI UK
Nowadays, you can walk into the C-suite and start a discussion about the IoT. They may not know what they're talking about but someone's told them they need to be thinking about it. But what is IoT?
One of the problems in industry you go to your Director and you explain this new tech and all the awesome things you can do with it and he asks the dreaded question, "So what?" Example of the Connected Toothbrush. With a connected toothbrush you can automatically have a remote relationship with your dentist. But he already drives a BMW and it's not clear how he benefits from that. He wants you in the office where, every time you open your mouth, he sees new business development opportunities. For the moment, he neither needs nor wants a remote relationship with you. Leads to the question about who's going to sign up for the business case.
Number of key areas that need to be addressed (paraphrased):
Crack open the Data: We are awash with data but don't know how to use it. Problem with clients is that they really buy into the vision but they have a lot of legacy systems (green screens!) which already run things.
Identifying value(?): Volume is not a sign of success. Need to know where the value is, which means understanding it. e.g is Enron. Thousands of documents, accountants all over the place with tons of data. It took a small team of a few researchers doing a case study that discovered no tax had ever been paid. They got to the right data and asked the right questions.
Connectivity(?): M2M has grown in a silo'd manner where a device takes data and sends it to one place. This needs to open up more and give access to other areas of business.
Business operations: People matter too. Process changes, systems that allow real-time decisions, empowering individuals.
Overall, IoT is an enabler for a number of wider trends. IoT can improve operational efficiency and enhance customer experience. Normally these are in conflict with each-other. However, if we just get into a tech-fest, then we might just end up with a large number of failures as we've seen with 'Big Data'
Poll - Which (categories of) business-related infrastructure categories are most urgently needed for building the Internet of Things.
David Dunn, Software Engineer, Electric Imp
Electric Imp aims to help people in Product Development in the IoT space, namely with the connectivity problem. The name 'IMP' came from 'Internet Messaging Processor'. Name is also from Terry Pratchett who's stories had devices that contained an imp which did the work :)
Building connected devices is difficult. Have to be secure, easy to set up. There are many choices for connectivity, problems with managing the code on the devices. Electric Imp picked wifi and decided to do it well. This satisfies a lot of markets but not all.
Side story of the hacked fridge sending spam emails.
Things run through Electric Imp system and they manage the security. It's security as a service. Electric Imp manages deployment of your code too so developers can just push code and it will end up on all their devices.
Simplicity comes from one-step setup. It's optical and uses a smartphone and flashes the screen, which sends the wifi SSID and credentials to the device (via a sensor). Customer code runs within a VM on top of with custom OS (impOS) taking care of I/O (TLS-encrypted wifi connection to service).
Difficult to debug devices in the field so cloud based- service helps with that. They do embedded stuff on the device and cloud stuff in the cloud. There's another VM in the cloud that's paired with the VM on the device. Heavy lifting can happens elsewhere so no need to parse JSON on the device itself.
Q - How do you deal with identity of units?
A - Imps have unique serials and the devices they're plugged into are also unique (missed how).
Jon Lewis, Chief Innovation Officer, Plextek Consulting
Plextek was working in this field when it was called 'telemetry'. One of the earliest things they worked on was 'Lojack' (car tracking). Things then shifted to 'smart cities' and now people talk about 'IoT' and want consumer devices with consumer volumes.
2014 has got to be year of IoT. However, still some way from business cases as it's too expensive to simply track your cat (approx £100 set up fee and then around £30 pcm).
Deployment often costs more than then the sensor devices. c.f street lighting systems. Choice of radio system is critical. Projects in UK/Europe are about incremental cost savings with pay-back in less than 5yrs. In industrialising countries, there is new build which is easier then retrofit. Urbanisation creates strain and demand. Local issues relate to the Spectrum issues and local regulations (e.g mandate to use govt encryption streams). Data privacy is also concern as we don't know who holds the monkey. That's made even more difficult when you go abroad. Local politics also matters.
Georg Steimel, Head of M2M Solutions, Huawei
About Huawei - Three businesses: Carrier network, Consumer business, Enterprise business - Revenues of $39Bn, No2. Telecom solution provider, 150k+ employees worldwide
Georg is in Consumer branch for Europe. Huawei wants to be top M2M module vendor in the coming years. They're operator and system integrator agnostic.
Tech evolution brings new opportunities. Mobile data traffic needs, Network upgrading etc. Ave user expected to generated 1G data traffic per month (currently 63MB today) and there are predicted to be 50Bn+ devices. Examples of users are highly varied including monitoring of pregnant cows.
Challenges for mass rollout are a scattered market requiring complex solutions. Alignment of all parties in one adequate business model is quite tricky, esp when it comes to customer ownership. Should consider the value chain.
Complex business model:
Amount of data generated is also going to be huge. Smart meters in German households could send over 4TB per day, which is a huge strain for mobile networks.
LTE advantage for M2M is obviously bandwidth but that's only relevant to a few verticals (signage, routers, and in-car entertainment). Low latency is more important for industrial alarms and medical devices.
Poll - What business models?
Break down of poll results.
Fredrik Sjostedt, Vice President | Corporate Marketing, Barco Ltd
Barco does a lot of visualising information. Displays for radiographers, stuff in cinema screens, screens in flight simulators and a lot more.
Need to take in data from sensors, display relevant information and allow operators to make SMART decisions. e.g number of CCTV cameras in London - who's looking at them? Real 'Big Data' step is at the operator level.
Important to use standards, rather than proprietary systems.
Q - In transport visualisation do you have low enough latency to get
A- Yes, e.g on a route in Brazil near Sao Palo it's closed dues to fog. Real-time info important for understanding the whole route.
Justin Hill, Co-Head of the Patent Prosecution Group and Purvi Parekh, Co-Head of International Telecoms, Olswang
Purvi on the Regulatory environment
No tailor made regulation v high regulatory focus. Regulatory focus from leading bodies and many of them said that existing regulation can be amended to facilitate M2M/IoT. Made more complicated by lack of globally agreed definition of IoT (which make it difficult for lawyers). There's generally promotion of Open Standards. EU does believe that this is v important. People should read the RAND report (Nov 2013), commissioned by the EU to examine if there should be standards.
Justin on IP issues
With any standardisation process, you're hoping to increase adoption with the trade-off of Right of IPR owners. FRAND (fair, reasonable, and non-discriminatory terms) are also important but that's also debatable. When considering how to protect IPR, it's worth comparing whole-system claims with protecting at different layers of the tech/value stack and build a portfolio.
Paul Copping, Corporate Development Director, TRL
TRL is the Transport Research Laboratory. Did some work to consider the impact of autonomous transport on the overall value chain. Some of things we're trying now with autonomous driving have been successfully done in the past by other means (e.g car tracking a lane with magnets).
Pilgrim Beart, Founder Director, 1248-io
What is IoT? Humans don't scale. Population is predicted to tail off at around 9Bn. Connected devices are growing and the rate of change is exponential (and increasing). In the 90s we used to be our own sysadmin with maybe one device. Now we have several and some cloud services. In the future there will be many more and they'll have to manage themselves. There are different problem you face as you go from 1 to 10 to 100 to 10000 to 1MM+ deployed devices. Issues ranging from device updates al the way to regulation and externalities that affect your business.
In 10 years time, when all is said and done, we won't be calling it the Internet of Things. It'll just be called the Internet.
Q from an application developers perspective wht would you different earlier
in your development if you were to go back and start again
A - AlertMe started a bit too early so there were a lot things were not off-the-shelf. Had to build. Right now can see a lot people reinventing the wheel but shouldn't need to do this.
Antony Rix, Senior Consultant, TTP
One of the things TTP works on are wireless standards.
The size of the grey blobs are in an indication of market size.
Users will expect things to cost pretty much the same as they did before. Cost will be critical and as we buy things we generally still expect them to last for years.
TTP Matrix is a proprietary standard specifically for use by low-cost wireless connected devices. It's optimised for efficient use of radio spectrum, long range, low cost and power efficient. Use-case example (DisplayData) are the pricing labels on supermarket. 50k products running through only 3 gateways (other standards would find this difficult). Low power consumption means a battery can last 10 years.
Poll - Which (categories of) technical / architecture building blocks are most urgently needed for building the Internet of Things.
Interesting discussion of Humans role in IoT. For home consumer devices people may not want to have to make any decisions (completely automated) but corporates my want all information in order to integrate and decide.
How quickly will this happen? It's difficult to say when you're in the middle of it. We'll know it's happened after the event. It takes about a decade for things to get things in to mature in the market. On the other hand, it's happening.
Poll - Big and small data - Which statement do you agree with most/least
I only got a pic of the latter but it's the inverse of the former (unsurprisingly).
Mirage has reached a point where it's possible to easily set up end-to-end toolchains to build unikernels! My first use-case is to be able to generate a unikernel which can serve my personal static site but to do it with as much automation as possible. It turns out this is possible with less than 50 lines of code.
I use Jekyll and GitHub Pages at the moment so I wanted a workflow that's as easy to use, though I'm happy to spend some time up front to set up and configure things. The tools for achieving what I want are in good shape so this post takes the example of a Jekyll site (i.e this one) and goes through the steps to produce a unikernel on Travis CI (a continuous integration service) which can later be deployed. Many of these instructions already exist in various forms but they're collated here to aid this use-case.
I will take you, dear reader, through the process and when we're finished, the workflow will be as follows:
To achieve this, we'll first check that we can build a unikernel VM locally, then we'll set up a continuous integration service to automatically build them for us and finally we'll adapt the CI service to also deploy the built VM. Although the amount of code required is small, each of these steps is covered below in some detail. For simplicity, I'll assume you already have OCaml and Opam installed -- if not, you can find out how via the Real Word OCaml install instructions.
To ensure that the build actually works, you should run things locally at
least once before pushing to Travis. It's worth noting that the
mirage-skeleton repo contains a lot of useful, public domain examples
and helpfully, the specific code we need is in
mirage-skeleton/static_website. Copy both the
dispatch.ml files from that folder into a new
_mirage folder in your
config.ml so that the two mentions of
./htdocs are replaced with
../_site. This is the only change you'll need to make and you should now
be able to build the unikernel with the unix backend. Make sure you have
the mirage package installed by running
$ opam install mirage and then run:
(edit: If you already have
mirage, remember to
opam update to make sure you've got the latest packages.)
$ cd _mirage $ mirage configure --unix $ make depend # needed as of mirage 1.2 onward $ mirage build $ cd ..
That's all it takes! In a few minutes there will be a unikernel built on
your system (symlinked as
_mirage/mir-www). If there are any errors, make
sure that Opam is up to date and that you have the latest version of the
static_website files from mirage-skeleton.
If you'd like to see this site locally, you can do so from within the
_mirage folder by running unikernel you just built. There's more
information about the details of this on the Mirage docs site
but the quick instructions are:
$ cd _mirage $ sudo mirage run # in another terminal window $ sudo ifconfig tap0 10.0.0.1 255.255.255.0
You can now point your browser at http://10.0.0.2/ and see your site!
Once you're finished browsing,
$ mirage clean will clear up all the
Since the build is working locally, we can set up a continuous integration system to perform the builds for us.
We'll be using the Travis CI service, which is free for open-source projects (so this assumes you're using a public repo). The benefit of using Travis is that you can build a unikernel without needing a local OCaml environment, but it's always quicker to debug things locally.
Log in to Travis using your GitHub ID which will then trigger a scan of your repositories. When this is complete, go to your Travis accounts page and find the repo you'll be building the unikernel from. Switch it 'on' and Travis will automatically set your GitHub post-commit hook and token for you. That's all you need to do on the website.
When you next make a push to your repository, GitHub will inform Travis,
which will then look for a YAML file in the root of the repo called
.travis.yml. That file describes what Travis should do and what the build
matrix is. Since OCaml is not one of the supported languages, we'll be
writing our build script manually (this is actually easier than it sounds).
First, let's set up the YAML file and then we'll examine the build script.
The Travis CI environment is based on Ubuntu 12.04, with a
number of things pre-installed (e.g Git, networking tools etc). Travis
doesn't support OCaml (yet) so we'll use the
c environment to get the
packages we need, specifically, the OCaml compiler, Opam and Mirage. Once
those are set up, our build should run pretty much the same as it did locally.
For now, let's keep things simple and only focus on the latest releases
(OCaml 4.01.0 and Opam 1.1.1), which means our build matrix is very simple.
The build instructions will be in the file
_mirage/travis.sh, which we
will move to and trigger from the
.travis.yml file. This means our YAML
file should look like:
language: c before_script: cd _mirage script: bash -ex travis.sh env: matrix: - MIRAGE_BACKEND=xen DEPLOY=0 - MIRAGE_BACKEND=unix
The matrix enables us to have parallel builds for different environments and
this one is very simple as it's only building two unikernels. One worker
will build for the Xen backend and another worker will build for the Unix
_mirage/travis.sh script will clarify what each of these
environments translates to. We'll come back to the
DEPLOY flag later on
(it's not necessary yet). Now that this file is set up, we can work on the
build script itself.
To save time, we'll be using an Ubuntu PPA to quickly get pre-packaged versions of the OCaml compiler and Opam, so the first thing to do is define which PPAs each line of the build matrix corresponds to. Since we're keeping things simple, we only need one PPA that has the most recent releases of OCaml and Opam.
#!/usr/bin/env bash ppa=avsm/ocaml41+opam11 echo "yes" | sudo add-apt-repository ppa:$ppa sudo apt-get update -qq sudo apt-get install -qq ocaml ocaml-native-compilers camlp4-extra opam
[NB: There are many other PPAs for different combinations of OCaml/Opam which are useful for testing]. Once the appropriate PPAs have been set up it's time to initialise Opam and install Mirage.
export OPAMYES=1 opam init opam install mirage eval `opam config env`
OPAMYES=1 to get non-interactive use of Opam (it defaults to 'yes'
for any user input) and if we want full build logs, we could also set
OPAMVERBOSE=1 (I haven't in this example).
The rest should be straight-forward and you'll end up with an
Ubuntu machine with OCaml, Opam and the Mirage package installed. It's now
trivial to do the next step of actually building the unikernel!
mirage configure --$MIRAGE_BACKEND mirage build
You can see how we've used the environment variable from the Travis file and
this is where our two parallel builds begin to diverge. When you've saved
this file, you'll need to change permissions to make it executable by doing
$ chmod +x _mirage/travis.sh.
That's all you need to build the unikernel on Travis! You should now commit both the YAML file and the build script to the repo and push the changes to GitHub. Travis should automatically start your first build and you can watch the console output online to check that both the Xen and Unix backends complete properly. If you notice any errors, you should go back over your build script and fix it before the next step.
When Travis has finished its builds it will simply destroy the worker and all its contents, including the unikernels we just built. This is perfectly fine for testing but if we want to also deploy a unikernel, we need to get it out of the Travis worker after it's built. In this case, we want to extract the Xen-based unikernel so that we can later start it on a Xen-based machine (e.g Amazon, Rackspace or - in our case - a machine on Bytemark).
Since the unikernel VMs are small (only tens of MB), our method for exporting will be to commit the Xen unikernel into a repository on GitHub. It can be retrieved and started later on and keeping the VMs in version control gives us very effective snapshots (we can roll back the site without having to rebuild). This is something that would be much more challenging if we were using the 'standard' web toolstack.
The deployment step is a little more complex as we have to send the Travis worker a private SSH key, which will give it push access to a GitHub repository. Of course, we don't want to expose that key by simply adding it to the Travis file so we have to encrypt it somehow.
Travis supports encrypted environment variables. Each
repository has its own public key and the Travis gem uses
this public key to encrypt data, which you then add to your
file for decryption by the worker. This is meant for sending things like
private API tokens and other small amounts of data. Trying to encrypt an SSH
key isn't going to work as it's too large. Instead we'll use
travis-senv, which encodes, encrypts and chunks up the key into smaller
pieces and then reassembles those pieces on the Travis worker. We still use
the Travis gem to encrypt the pieces to add them to the
While you could give Travis a key that accesses your whole GitHub account, my preference is to create a new deploy key, which will only be used for deployment to one repository.
# make a key pair on your local machine $ cd ~/.ssh/ $ ssh-keygen -t dsa -C "travis.deploy" -f travis-deploy_dsa $ cd -
Note that this is a 1024 bit key so if you decide to use a 2048 bit key, then be aware that Travis sometimes has issues. Now that we have a key, we can encrypt it and add it to the Travis file.
# on your local machine # install the necessary components $ gem install travis $ opam install travis-senv # chunk the key, add to yml file and rm the intermediate $ travis-senv encrypt ~/.ssh/travis-deploy_dsa _travis_env $ cat _travis_env | travis encrypt -ps --add $ rm _travis_env
travis-senv encrypts and chunks the key locally on your machine, placing
its output in a file you decide (
_travis_env). We then take that output
file and pipe it to the
travis ruby gem, asking it to encrypt the input,
treating each line as separate and to be appended (
-ps) and then actually
adding that to the Travis file (
--add). You can run
$ travis encrypt -h
to understand these options. Once you've run the above commands,
.travis.yml will look as follows.
language: c before_script: cd _mirage script: bash -ex travis.sh env: matrix: - MIRAGE_BACKEND=xen DEPLOY=0 - MIRAGE_BACKEND=unix global: - secure: ".... encrypted data ...." - secure: ".... encrypted data ...." - secure: ".... encrypted data ...." ...
The number of secure variables added depends on the type and size of the key you had to chunk, so it could vary from 8 up to 29. We'll commit these additions later on, alongside additions to the build script.
At this point, we also need to make a repository on GitHub
and add the public deploy key so
that Travis can push to it. Once you've created your repo and added a
README, follow GitHub's instructions on adding deploy keys
and paste in the public key (i.e. the content of
Now that we can securely pass a private SSH key to the worker and have a repo that the worker can push to, we need to make additions to the build script.
Since we can set
DEPLOY=1 in the YAML file we only need to make
additions to the build script. Specifically, we want to assure that: only
the Xen backend is deployed; only pushes to the repo result in
deployments, not pull requests (we do still want builds for pull requests).
In the build script (
_mirage/travis.sh), which is being run by the worker,
we'll have to reconstruct the SSH key and configure Git. In addition,
Travis gives us a set of useful environment variables so we'll
use the latest commit hash (
$TRAVIS_COMMIT) to name the the VM (which also
helps us trace which commit it was built from).
It's easier to consider this section of code at once so I've explained the
details in the comments. This section is what you need to add at the end of
your existing build script (i.e straight after
# Only deploy if the following conditions are met. if [ "$MIRAGE_BACKEND" = "xen" \ -a "$DEPLOY" = "1" \ -a "$TRAVIS_PULL_REQUEST" = "false" ]; then # The Travis worker will already have access to the chunks # passed in via the yaml file. Now we need to reconstruct # the GitHub SSH key from those and set up the config file. opam install travis-senv mkdir -p ~/.ssh travis-senv decrypt > ~/.ssh/id_dsa # This doesn't expose it chmod 600 ~/.ssh/id_dsa # Owner can read and write echo "Host some_user github.com" >> ~/.ssh/config echo " Hostname github.com" >> ~/.ssh/config echo " StrictHostKeyChecking no" >> ~/.ssh/config echo " CheckHostIP no" >> ~/.ssh/config echo " UserKnownHostsFile=/dev/null" >> ~/.ssh/config # Configure the worker's git details # otherwise git actions will fail. git config --global user.email "firstname.lastname@example.org" git config --global user.name "Travis Build Bot" # Do the actual work for deployment. # Clone the deployment repo. Notice the user, # which is the same as in the ~/.ssh/config file. git clone git@some_user:amirmc/www-test-deploy cd www-test-deploy # Make a folder named for the commit. # If we're rebuiling a VM from a previous # commit, then we need to clear the old one. # Then copy in both the config file and VM. rm -rf $TRAVIS_COMMIT mkdir -p $TRAVIS_COMMIT cp ../mir-www.xen ../config.ml $TRAVIS_COMMIT # Compress the VM and add a text file to note # the commit of the most recently built VM. bzip2 -9 $TRAVIS_COMMIT/mir-www.xen git pull --rebase echo $TRAVIS_COMMIT > latest # update ref to most recent # Add, commit and push the changes! git add $TRAVIS_COMMIT latest git commit -m "adding $TRAVIS_COMMIT built for $MIRAGE_BACKEND" git push origin master # Go out and enjoy the Sun! fi
At this point you should commit the changes to
./travis.yml (don't forget
the deploy flag) and
_mirage/travis.sh and push the changes to GitHub.
Everything else will take place automatically and in a few minutes you will
have a unikernel ready to deploy on top of Xen!
[Pro-tip: If you add
[skip ci] anywhere in your
commit message, Travis will skip the build for that commit.
This is very useful if you're making minor changes, like updating a
Since I'm still using Jekyll for my website, I made a short script in my
jekyll repository (
_deploy-unikernel.sh) that builds the site, commits the
_site and pushes to GitHub. I simply run this after I've
committed a new blog post and the rest takes care of itself.
#!/usr/bin/env bash jekyll build git add _site git commit -m 'update _site' git push origin master
Congratulations! You now have an end-to-end workflow that will produce a
unikernel VM from your Jekyll-based site and push it to a repo. If you
strip out all the comments, you'll see that we've written less than 50 lines
of code! Admittedly, I'm not counting the 80 or so lines that came for free
*.ml files but that's still pretty impressive.
Of course, we still need a machine to take that VM and run it but that's a topic for another post. For the time-being, I'm still using GitHub Pages but once the VM is hosted somewhere, I will:
Although all the tools already exist to switch now, I'm taking my time so that I can easily maintain the code I end up using.
You may have noticed that the examples here are not very flexible or extensible but that was a deliberate choice to keep them readable. It's possible to do much more with the build matrix and script, as you can see from the Travis files on my website repo, which were based on those of the Mirage site and Mort's site. Specifically, you can note the use of more environment variables and case statements to decide which PPAs to grab. Once you've got your builds working, it's worth improving your scripts to make them more maintainable and cover the test cases you feel are important.
You might have noticed that in very few places in the toolchain above have I
mentioned anything specific to static sites per se. The workflow is simply
(1) do some stuff locally, (2) push to a continuous integration service
which then (3) builds and deploys a Xen-based unikernel. Apart from the
convenient folder structure, the specific work to treat this as a static
site lives in the
*.ml files, which I've skipped over for this post.
As such, the GitHub+Travis workflow we've developed here is quite general and will apply to almost any unikernels that we may want to construct. I encourage you to explore the examples in the mirage-skeleton repo and keep your build script maintainable. We'll be using it again the next time we build unikernel devices.
Acknowledgements: There were lots of things I read over while writing this post but there were a few particularly useful things that you should look up. Anil's posts on Testing with Travis and Travis for secure deployments are quite succinct (and were themselves prompted by Mike Lin's Travis post several months earlier). Looking over Mort's build script and that of mirage-www helped me figure out the deployment steps as well as improve my own script. Special thanks also to Daniel, Leo and Anil for commenting on an earlier draft of this post.Share / Comment