unhosted web apps

freedom from web 2.0's monopoly platforms

14. Peer-to-peer communication

Routability

To establish communication between two people, one of them generally has to have a way to find the other. In a client/server situation, it is generally the client who finds the server starting from a domain name, through a DNS lookup, and then establishes a tcp connection going out from the client, routed over IP, and into a TCP port on which the server is listening. This requires the server to be listening on a port which was agreed beforehand (probably a default port for a certain protocol, or one included alongside the server's domain name in a URL), and requires the server to be online at the time the client initiates the communication, on the IP address advertised in DNS.

Suppose we were to use this system for peer-to-peer communication, where the caller and the callee have a similar device setup. Unless the callee stays on one static IPv4 address, they would then have to use dynamic DNS to keep their IP address up to date, and would also need to be able to open a public port on their device.

While the first is often possible, the second is often not; especially when you're on the move. An easy solution for this is to route all incoming traffic to an unhosted web app through sockethub.

Send me anything

We already used Webfinger in episode 7 to announce the location of a user's remoteStorage server. We can easily add an additional line to the user's webfinger record, announcing a URL on which the user may be contacted. If the goal is to send a message to the user (a string of bytes), then we can use http post for that, or a websocket. The benefit of using a websocket is that a two-way channel stays open, over which real-time communication ('webrtc') can be established, if it is upgraded from a WebSocket to a PeerConnection. Here is an example webfinger record that announces a 'post-me-anything' link for http posts, and a 'webrtc' link for establishing a webrtc control channel:


{
  "subject": "acct:[email protected]",
  "aliases": [ "https://michielbdejong.com/" ],
  "links": [
    { "rel": "post-me-anything",
        "href": "https://michielbdejong.com:11547/" },
    { "rel": "webrtc",
        "href":"wss://michielbdejong.com:11548/" }
  ]
}

Both would probably be an end-point on your personal server, not directly on your device, since your device is usually not publically addressable. But using the 'webrtc' platform module of the adventures fork of sockethub, an unhosted web app can instruct sockethub to open a public WebSocket on a pre-configured port.

Any traffic coming into this WebSocket will then transparently be forwarded to the unhosted web app, and reversely the unhosted web app can send commands to sockethub that will be forwarded back to the contacting peer. In that sense it's very much like Pagekite - a sort of reverse proxy tunnel. For the post-me-anything handler you could use a variation on the post handler we set up in episode 3, when we discussed file sharing.

A small example script allows you to tell sockethub that you are online and willing to take incoming calls, and also to check if a certain webfinger user is online, and chat with them if they are.

So this simple demo app uses webfinger and sockethub to implement a simple text chat. It's still not very usable, since sockethub was not really designed to listen on ports; this means you can have only one chat conversation at a time. But we'll work on that and will eventually get it working as a proper chat app, with an addressbook and everything.

Caller ID

So combining websockets with webfinger, participating users can now send byte strings to each other. Anybody can send any byte string to anybody, just like in email. But unlike email, right now there is no way to claim or prove a sender identity; every caller appears as an anonymous peer.

The receiver will be able to reply to the real sender in the http response, or for as long as the caller keeps the websocket open. But other than that, you would have to tell the callee with words who you are, and also convince them with words that you actually are who you say you are. Both these things could be automated, for instance, the sockethub server could accept DialBack authentication.

Dialback is a work-in-progress and currently in expired state, but it's a very simple way to find out if a request is from the person it claims to be from, without the need to implement any cryptography, neither on the identity provider side, nor on the relying party side.

Other options would be using PGP signatures inside the message, or WebID client certificates at the transport level, but both would probably require the help of the sender's personal server, or the sender's browser would have to be improved to better support such features.

Also, all caller ID systems would break down if the caller's device has been compromised. That problems is usually left out-of-scope in technical solutions to this problem.

But in general, what we see on the PGP-less email platform is that being able to receive a message from any stranger is a feature, and if that stranger claims to be 'ebay-central-support.com' then the user could still be tricked in thinking they are from ebay, even with a "secure" caller ID system in place.

Likewise, on irc we usually assume that nobody is going to supplant other people's identities in a public real-time chat, and we often take the things said in the content of the message as effectively sufficient proof of the sender's identity.

The universal language problem and the polyglot solution

If a message arrives as a finite byte string in your inbox, then how do you know how to interpret it? If it came in over http, then there might have been a Content-Type header included in the requests, that will definitely help in most cases, and usually inspecting the first few bytes, or the filename extension can give some hints as to what the format of the message might be, but that does not take away the fact that these are hints and tricks derived from common practice.

It is, in a way, only an extension to the collection of human languages which are common knowledge between sender and receiver. Take for instance a file you find on a dead drop cache. These are USB drives in public spaces where anyone can leave or retrieve files for whoever finds them. Each file on there is just a string of bytes, and there is no out-of-band channel between the sender and the receiver to give context, other than the knowledge that the sender is probably human and from planet Earth. The filename extension, and other markers (you could see such metadata as just part of the byte string making up the message) are conventions which are common knowledge among human app developers on this planet, so an app can often make sense of many types of files, or at least detect their format and display a meaningful error message.

The problem becomes even more interesting if you look at the ultimate dead drop cache, the Voyager Golden Records which Carl Sagan et al. sent into outer space. Hofstadter explains this wonderfully in the book "Gödel Escher Bach", alongside many other fundamental concepts of information and intelligence, as the fundamental impossibility of a "universal language". If there is no common knowledge between sender and receiver, then the receiver doesn't even have a way to extract the byte string out of the physical object, or even to understand that the physical object was intended to convey a message.

In practice, even though we created a way here for the receiver to know that someone wanted them to receive a certain string of bytes, it is from there on up to the app developer to implement heuristics, based on common practice, so that messages in plain utf8 human languages, html, various image, audio and video formats, and maybe formats like pdf and odf, can all be displayed meaningfully to the user.

There are many such well-defined document formats that are independent of messaging network, data transport, and to some extent even display medium (e.g. print vs screen). But I cannot think of any that's not based on the habit of transporting and storing documents as byte strings. And many of these document formats have been designed to be easy to recognize and hard to confuse, often with some unique markers in the first few bytes of the file.

So an app that receives "any message" that is sent to the user should take a polyglot approach when trying to display or interpret the message: try to accept as many languages as possible, rather than trying to push for one "winner" language. That way we can separate the document format from the messaging platform.

Using PeerConnection

One thing you may want to send to another user over a WebSocket, might be an offer for a WebRTC PeerConnection. Because making all traffic go through the sockethub server of the callee is not really peer-to-peer. But that's at the same time an important point to make in general: unless you use a distributed hash table to establish a peer-to-peer messaging session, the first handshake for such a session always starts with a request to a publically addressable server.

However, using the new PeerConnection technology, it is possible to upgrade to a shortcut route, once first contact has been made. I attempted to write a caller and callee script to demo this, but ran into some problems where both Firefox Nightly and Chrome give an error which I wasn't expecting from the documentation.

I'm sure I just made some silly mistake somewhere though, . This is all still quite new ground to explore - you will probably find some interesting up-to-date demo projects when searching the web. The main point I want to make and repeat is that PeerConnection is a way to establish a shortcut route between two peers, once first contact has already been established. This will lead to lower latency, and will enable the use of end-to-end encryption between the two peers. So it's really quite valuable. But it is not a serverless protocol.

Eventually, when we get this working, we will be able to replace Skype, and use an unhosted web app (with a little help from sockethub) instead. Next week we will have a look at one last piece in the puzzle to complete what I think of as the first part of this handbook for the No Cookie Crew: after having dumped GMail, Dropbox, Google Docs, Spotify, Facebook, Twitter, Flickr, and with this episode also having opened the way for dumping Skype, the last cookied platform we will need to replace in order to cover "the basics" is github. I'm already looking forward to that one, hope you are too.

comments welcome!

Next: Unhosted web apps and OAuth