unhosted web apps

freedom from web 2.0's monopoly platforms

11. App hosting

Hosting is a telecommunications transport through space and time.

This blogpost goes from me to you. I am writing this at a Starbucks in Bukit Bintang, on a rainy Saturday. You are reading this in a different place, at a later time. Hosting is just a transport that makes this possible.

Even when consuming hosted content, we should not forget to design our internet applications with the end-to-end principle in mind. If we want the internet to empower people, then we should model all communication as taking place between people. This may disrupt a few assumptions that you will have been fed during the age of hosted software.

TLS is not secure unless combined with on-premises hosting

TLS (formerly known as SSL, the technology behind https:// URLs), stands for Transport Layer Security. It establishes a secure tunnel between the client and the server. Third parties can see that this tunnel exists, and roughly how much data travels through it, in which direction, and in which timing patterns, but they cannot see the unencrypted content that is being sent back and forth.

On top of TLS, the web uses a Public Key Infrastructure (PKI) based on Certificate Authorities (CAs). This system has its own problems because it centralizes power, I'm not even talking about that.

Purely the client-server security architecture in itself is an outdated approach to security. It makes the assumption that the server is under the physical control of the person or organization publishing the content. This was probably true for 90% of the first websites, 20 years ago, but this percentage is probably closer to 10% (or even less) nowadays. This means we should consider servers as intermediate hops, not end points, in the end-to-end communication model.

The problem is reasonably limited with co-located servers, where the data center hosting your server does not have the root password to it. In this case, any intrusion would at least be detectable, provided your co-located server has good sensors for "case open" and similar alerts.

But current-day Infrastructure-as-a-Service hosting provides virtualized servers, not physical ones. This is good because it allows the IaaS provider to move your server around for load-balancing and live resizing, but it also means that the host system has access to your private TLS key.

At a good IaaS company, only systems administrators under sysadmin oath and the management above them will have access to your server. Good managers at IaaS companies will only use their access to grant newly employed sysadmins the necessary access, and maybe a few nation state governments, in whom society trusts to spy on everybody to help them identify dangerous terrorists.

You can always recognize sysadmins because they will look away with a resolute gesture when you type your password in their presence. Even if it's the password that you just asked them to reset. It is in a sysadmin's professional genes to be very precise about security and privacy. Still, TLS-hosted websites can no longer be sensibly called end-to-end secure in the age of web 2.0 and "the cloud", since the communicating parties (Alice and Bob, as we often call them) do not have a secure channel between them.

So where does that leave us? Let's look specifically at app hosting.

Wait, isn't hosting an unhosted web app a contradiction? :)

Yes, of course, to be very precise, we should have called unhosted web apps 'statics-only web apps', since what we mean is 'web apps whose functionality is not produced by code that runs server-side, even though the source files of the app are still hosted as static content'. But that just didn't sound as intriguing. ;)

The web has no way of signing web apps. When hosting an unhosted ('statics-only') web app, we just serve each of the files that make up the app on a website, and the only way to know whether you are seeing the web app that the developer wanted you to see, is if the webserver is under the physical control of the developer. This doesn't often happen, of course (especially with nomadic developers!), so in most cases we will have to compromise a little bit on security.

In practice, security is always relative, and unless you mirror the apps you use, and host them yourself inside your LAN, in almost all cases we will have to trust at least one intermediate party: the app hoster. There is just one exception to this: RevProTun systems like Pagekite.

RevProTun and Pagekite

If you have a FreedomBox in your home, or you otherwise run a server that is always-on in a place that you physically control, then you can put the unhosted web apps you publish on there, hosted on a simple statics server with TLS. You can use a nodejs script for this like the one we described in episode 3: setting up your personal server. Once you have a TLS-enabled website running on localhost, you can publish your localhost to the world using a RevProTun service. A good example, personal friends of mine and also the main drivers behind the development of RevProTun, are Pagekite. You open a tunnel account with them, install their client app in a place where it can "see" the localhost service you want to publish, and it will start delivering traffic from the outside world to it, functioning as a reverse proxy tunnel from your publically accessible URL to your locally ran TLS-secured statics hosting service.

RevProTun is an important part of how we want to add Indie Web hosting to FreedomBox. These are still future plans for which we always struggle to find time, but we are determined to make a "freedom-compatible home hosting" solution like this available in a format that is also usable by end-users who may not want to know all the exact details of how reverse proxy tunnels work. If you do want to know more about them, a good place to start is Bjarni's FOSDEM talk about Pagekite. Unfortunately, it seems there has not been a lot of progress on the RevProTun standard since it was proposed in 2012.

How origins work

Each web app should be hosted on its own "origin", meaning its own unique combination of scheme, host and port.

The reason for this is that your browser will shield off apps from each other, putting them each in their own limited sandbox. If you would host two apps on the same origin, then your browser would allow these apps to access each other's data (for instance, the data they store in localStorage), and then things would generally just become a big mess. So don't ever try to do this. When hosting apps, host each one on its own origin.

This also means that, even though dropbox.js and GitHub are great tools for unhosted web app developers, you don't want to publish any unhosted web apps directly on https://dl.dropbox.com/, or host more than one app on each http://username.github.com/, unless maybe if they are apps that will not handle any user data (for short-lived and self-contained html5 games for instance, this security consideration may be less important).

So suppose you have your Indie Web domain name and you want to publish some unhosted web apps to the world. For http hosting (without TLS), this is not so difficult to do. Remember an origin is defined by scheme, host and port. The scheme of each app's origin will always be 'http', but you can point as many subdomain hosts to your IP address as you want; especially if you use a wild-card CNAME. Then you can just use http://app1.example.com/, http://app2.example.com/, etcetera as origins for the apps you host.

If you want to do this properly however, and get at least the amount of security we can despite your server probably being hosted on insecure IaaS infrastructure, then you should always host apps on https. This brings into play the limitations of which origins your TLS cert will work on. Unless you own a wild-card certificate, you will probably only have two hosts, probably "something.example.com" and its parent, "example.com".

But luckily, origins are defined by scheme, host, and port. This means you can host tens of thousands of apps using the same cert; simply use origins like "https://apps.yourdomain.com:42001/", "https://apps.yourdomain.com:42002/", etcetera.

Using SNI to host multiple TLS certs

Five or ten years ago SNI didn't exist yet, and you had to get one dedicated IPv4 address for each and every TLS cert. Until recently I thought this was still the case, but as Bjarni explained to me the other day, this is now a resolved problem. Using SNI, you can host as many TLS certs on one virtual server as you like, without having to request extra IPv4 addresses and paying the extra fee for them. I hadn't switched to this myself yet, since I have published my unhosted web apps apps all through 5apps, but nowadays unhosted.org is SNI-hosted together with 3pp.io. I did have two IPv4 addresses on my server, one for nodejs and one for apache, and your server will need two IP addresses to do STUN in a few episodes from now (yes! we're going to play with PeerConnections soon), but in general, SNI is a good thing to know about when you're hosting apps.

5apps

There is a very cool Berlin start-up, again, also personal friends of mine, called 5apps. They specialize in app hosting for unhosted web apps. You may have seen them mentioned on our tools page, and most of our featured apps are hosted on 5apps. They have just launched support for https hosting last week (using their wildcard TLS cert for *.5apps.com), and deploying your app to their hosting gives you a number of advantages over home-baking your own app hosting. They will automatically generate an appcache manifest for your app (more about this below), resize your icon for the various favicon formats for browsers as well as for instance iPad devices, and generate all the various file formats for submitting your app to various review sites and app stores (more about that next week).

On top of that, they offer a number of things which "make your app nice", like browser compatibility detection, a user feedback widget, and JavaScript minification. Statics hosting is a mature market; you can find various companies who specialize in it. But the advantage of 5apps over for instance Amazon S3 is that 5apps specialize specifically in the hosting of your unhosted web apps. And like with Pagekite, I can personally confirm that they are nice people, who care about re-decentralizing the web. :)

For completeness, let me mention that AppCloudy provide a similar product to 5apps, and there are probably some others. StackMob and Tiggzy also offer html5 app hosting as a sideline to their backend service for native mobile apps. As of September 2014, 5apps are the top hit for "html5 app hosting" on both DuckDuckGo and Google, but I will update this section as the market evolves (tips welcome!).

Mirroring and packaging.

Since the early days of the web, important websites, especially if they contain big downloadable files, have been mirrored. An unhosted web app is statics only, so it can be serialized as a folder tree containing files. This folder tree can be packaged as an FTP directory, a git repo, a zip file, a zip file with a '.crx' extension (as used by Chrome extensions), or a bittorrent for example. If you transport it over one or more of those media, then it can easily be mirrored by more than one app hoster.

We call a statics-only web app that is underway from one mirror to another in such a way a "packaged web app".

There is a fascinating aspect in the concept of such packaged web apps: they are robust against attacks on our DNS system.

Sometimes, governments block websites. Probably in almost all cases this is done by people who think they are doing the right thing: they really honestly think that Facebook is bad for Vietnam, or that Wikileaks is bad for the USA, or that ThePirateBay is bad for the Netherlands, and that such censorship is eventually in the interest of the people who installed them as government employees.

Or maybe they honestly think the interests of Communism, or the interests of National Security, or the interests of Entertainment Industry are more important than considerations about freedom of communication.

Whatever the reasons of such actions, and whatever the moral judgement on whether Facebook, Wikileaks and ThePirateBay are evil websites, from a technological design principle of "kill-switch resilience", it would be nice if we had a way to publish unhosted web apps in a decentralized way.

It would also just be nice from a practical point of view to have a local copy of a lot of information which we often look up online. I am writing this right now in a place without wifi, and so far I have made five notes of things I should look up on MDN later. If I had a local copy of MDN, that would make me a lot less reliant on connectivity.

As an example, let's take Wikipedia. A very big chunk of all human knowledge is on there. You can download a compressed xml file containing all the text (not the images) of the English edition. It's about 9 Gigabytes. Massaging this a bit, and adding the necessary rendering code, we could make this into an unhosted web app that can be run on localhost, using a simple statics hosting script like the ones we've seen in previous episodes.

The obvious way to distribute such a multi-Gigabyte file would be bittorrent. So that sounds totally feasible: we put huge unhosted web apps into torrent files, and everybody can cooperate in seeding them. We could do this for Wikipedia, MDN, Kahn Academy, Open Yale Courses, this year's EdgeConf panel discussions, any knowledge you would like all humans to have access to. Given how cheap a 1 Terabyte external hard drive is nowadays (about 80 USD), you could even unhost the entire Gutenberg project, and still have some disk space left.

And of course, you could use this for small apps as well. Even if unhosted web apps cache themselves in your browser when you use them, it would be nice to have an app server somewhere inside your LAN that just hosts thousands of unhosted web apps on local URLs, so that you can access them quickly without relying on connectivity.

This is an idea that I started thinking about a couple of weeks ago, and then Nick found out that there is a project that does exactly this! :) It's called Kiwix. They use the ZIM file format, which is optimized for wiki content. They also provide an application that discovers and retrieves ZIM files, and serves them up on port 8000, so you can open a mirrored version of for instance a packaged website about Venezuela on one of your many localhost origins, for instance http://127.13.37.123:8000/wikipedia_es_venezuela_11_2012/ (remember the localhost IP space is a "/8" as they say, it covers any IP address that starts with "127.", so there are enough origins there to host many apps on).

Since then, Rahul Kondi and I did Fizno as a Torrent-based "physical node" at a hackathon in Bangalore, and later I discovered the existence of the excellent LibraryBox build-or-buy project.

Packaged app hosting conventions

There are a few conventions that you should keep in mind when writing a webserver that hosts static content. Most of those are well known in web engineer folklore, and if you use software like Apache or node-static, then this will all just work automatically by default.

A packaged web app, regardless of whether it is packaged into a torrent file or a zip file or git repository, is a folder with documents and/or subfolders. Each subfolder can again contain more documents and subfolders. No two items in one same folder can have the same name, names are case-sensitive, and although in theory all UTF-8 characters should be allowed in them, you should be aware of the escape sequences required by the format in which you are packing it up. Of course the documents in this folder structure map onto documents that should be made available over http 1.1, https, http 2.0 and/or SPDY, splitting the URL path along the forward slashes, and mapping that onto the folder structure, taking into account again the escape sequences defined for URIs and IRIs.

The file extension determines which 'Content-Type' header should be sent. A few examples:

.html -> text/html
.js -> application/javascript
.css -> text/css
.png -> image/png
.svg -> image/svg+xml
.jpg, .jpeg or even .JPG -> image/jpeg
etcetera

The server should also set a character set header (usually UTF-8), and then make sure it also serves the contents of each document in that character set.

To allow the browser to cache the files you serve, and then retrieve them conditionally, you should implement support for ETag and If-None-Match headers.

Whenever a URL maps to a folder (either with or without a forward slash at the end) instead of to a document, if an index.html document exists in that folder, you should serve that.

If the requested document does not exist, but one exists whose full path differs only in case (e.g. foo/bar.html was requested, but only Foo/Bar.html exists), then redirect to that.

Apart from that, if a URL maps to no existing item, serve either the '404.html' document from the root folder if it exists, or some standard 404 page, and in this case of course send a 404 status instead of a 200 status.

Additionally implementing support for Byte-ranges is nice-to-have, but not essential.

Caching and the end-to-end principle

Appcache is a way for a web app to tell a browser to pro-actively cache parts of it. It is still a relatively new technology, and if you want to know how it got its nickname of 'appdouche cachebag', you should definitely watch at least the first 10 minutes of this video:

(show http://www.youtube.com/embed/Oic22dQMRXQ)

An improved successor to Appcache is being prepared with ServiceWorkers, but these are not yet usable in mainstream browsers.

One good thing about appcache is that, like the browser's http cache, it works end-to-end. The web of documents was designed with caching of GET requests in mind. A lot of hosted web apps will cache the content produced by their webservers using something like squid or varnish, behind an https offloading layer. Often, this will be outsourced to a CDN. After that, in theory researchers thought that multicasting would become big on the internet, but this didn't really happen, at least not for web content. Instead, content providers like youtube and CDNs like Akamai extended their tentacles right into the exchange hubs where they peer with ISPs. In a sense, we have the multicast backbone that academia tried to develop, but it's now privately owned by companies like the two I just mentioned.

So once the content reaches the ISP, it will often be cached again by the ISP, before it is delivered to the Last Mile. None of this works when you use end-to-end encryption, though. Multicasting traffic is a lot like deduplicating content-addressable content, only harder because of the time constraints. And as the rule of thumb goes, out of 1) end-to-end encryption, 2) multicast(/deduplication), and 3) making sense, you can only pick two. :)

Using convergent encryption, a form of encrypted multicast would be possible if you only care about protecting the contents of secret document, but this would still allow eavesdroppers on the same multicast tree to discover which documents you are streaming, so such a setup wouldn't be as private as a https connection.

This means that on https, any request that misses cache on your device, will have to make the round trip all the way to the app hoster. Mirroring can decrease the distance between the app hoster and the user, but whenever that distance causes page load times to be noticable (say, multiple tenths of a second), the thing you will want to use is appcache.

The appcache discussion

First of all, when you use appcache you should only cache the "shell" of your entire app, so all its layout and functionality, but not any content it may include (this is also explained in the video above). For caching content that may be relevant this session, but already replaced next time the user uses the app, you may want to implement your own custom caching system inside your app, using for instance PouchDB or LawnChair.

As is also mentioned in the video, Mozilla and Google are cooperating to 'fix appcache'. There will probably be a more powerful API with less surprises.

The result of using appcache though, once you get used to deployed changes not showing up on the first refresh, is amazing. Your apps will Just Work, even if there is no wifi connection. And if there is a wifi connection, then they will load very fast, much faster of course than if you wouldn't cache them.

Conclusion

Sorry if this episode was a bit long, but I think this covers most of the topics you have to keep in mind about app hosting. One of the nice things of unhosted web apps is that they are so easy to host, you don't have to set up a backend with database servers etcetera. This also makes them very cheap to host, compared to hosted web apps (in the sense of 'dynamic' web content as opposed to 'static' web content).

Unhosted web apps are also easier to mirror for kill-switch resilience, and they are very suitable as a cross-platform alternative for native smartphone apps and native tablet apps.

Next week we will put together what we discussed about linking and hosting, and talk about how with unhosted web apps, the web itself becomes an app store. See you then! :)

comments welcome!

Next: App discovery