unhosted web apps

freedom from web 2.0's monopoly platforms

13. Dealing with users in unhosted web apps

Database sharding.

When you design a hosted web app, all data about all users goes into your database. You probably have a users table that contains the password hashes for all valid usernames on your website, and maybe some profile information about each user, like email address and full name, as supplied when the user signed up. If you allow users to sign in with Persona, Facebook, Twitter, Google, or github, then your users table would have columns linking to the user's identifier at their identity provider.

Depending on the type of application, there will be various other tables with other kinds of data, some of which will be keyed per user, and some of which will not. For instance, you may have a photo_albums table, where each photo album is owned by exactly one user, so that the owner_id field in the photo_albums table refers to the user_id of the owning user in the users table.

Other tables may be unrelated to specific users, for instance you may have a cities table with information about cities where some sort of events can take place in your app, but you would not have an owner_id field on that table like on the photo_albums table.

Tables where each row is clearly owned by, or related to, a specific user-id can be sharded by user-id. This means you split up the table into, say, 100 smaller tables, and store photo albums owned by a user-id ending in, say, '37' in the table named photo_albums37. For small websites this is not necessary, but as soon as you start having about a million rows per table, you need to shard your application's database tables in this way.

If you use Oracle as your database software, then it will do this transparently for you. But most hosted web applications use a database like for instance MySQL, for which you have to do the sharding in your backend code. Once a table is sharded, you can no longer do full-table searches or JOINs. Each query to a sharded table will have to have a 'WHERE owner_id=...' clause. I learned this while working as a scalability engineer at Tuenti, before starting to think about unhosted web apps.

Per-user data

In a big hosted web app, data is generally stored and accessed per-user, with no possibility to take advantage of the fact that you have all the data of all users in one centralized place. Only generic information (like the cities database), and search indexes will usually be designed in a way that does not shard the database architecture per-user.

It was this realization that made me think that per-user data storage could be possible, and through conversations with Kenny (the original programmer of Tuenti's website), got the idea to start this research project. Since that day, and alongside many people who joined the 'unhosted web apps' movement that it grew into, I have dedicated myself to trying to save the web from web 2.0's platform monopolies, whose business model is based on putting the data of as many users as possible in one centralized place that they control.

As a group, we then developed remoteStorage as a way for unhosted web apps to store data of the current user in a place that this user decides at run-time. This does not, however, say anything about how the app can refer to other users, and to data owned by those others.

Webfinger

To let the user connect their remoteStorage account at run-time, we rely on Webfinger. This is a simple protocol that allows the app to discover the configuration parameters of the current user's remoteStorage server, based on a user@host identifier. It also allows any app to directly discover public profile data about any user, like for instance their full name, public PGP key, or avatar URL.

There is some controversy, and a lot of bikeshedding, about how to use Webfinger records to refer to a user on the web. One option is to consider acct:user@host as a URL (the Webfinger spec, at least in some of its versions, registers acct: as a new URI scheme). Another option is to use the user@host syntax only in user-facing contexts, and use the actual document location (with either https: or http: as the scheme) when constructing linked data documents.

In order for Webfinger to work well, the data offered in a Webfinger record should be user-editable. For instance, if a user wants to publish a new PGP key, they should have an easy way to do that at their identity provider. Currently, the easiest way, if not the only way, to have full control over your Webfinger record, is to host it yourself, on your Indie Web domain.

It is quite easy to add a Webfinger record to your domain, simply follow the instructions in the Webfinger spec.

Contacts

Many apps will give users a way to connect and interact with each other. For this we defined the remoteStorage.contacts module. It acts as an addressbook, in that it allows an app to store basic contact information about other users.

Storing the current user's contacts in remoteStorage.contacts has two big advantages: it keeps this personal and possibly sensitive data under the user's control, rather than under the control of the app developer, and it allows multiple apps to reuse one same addressbook.

You could write a social unhosted app to retrieve your friends list from Facebook (you would have to add this as a verb on Sockethub's Facebook platform), and store them on your remoteStorage. You could then set a contact from your addressbook as a target for something you post (and the message would automatically be sent via Facebook chat if it's a Facebook contact).

This could then be made into a generic messaging app, from where you can contact any of your friends from one addressbook, seamlessly across the borders of Facebook, Twitter and email.

User search

Another goal is to create an unhosted web app that contains index data for the public profiles of millions of people, again, seamlessly spanning across various platforms. I created a prototype for this last year, and called it useraddress. At the time, we were thinking about making this a centralized service, but I think it's better to package this index into a bittorrent file, and distribute it to anybody who wants to mirror it.

All this is still in a very early stage of development, and currently for most unhosted web apps we still have to type user addresses (like email addresses or Twitter handles) from memory each time. But we are very determined to get this part working well, because it is often lack of searchability that limits the usefulness of federated social servers. And liberating the social graph is a big part of liberating users from the grip of platform monopolies.

Contacting other users

The web does not have a way to contact people. It has a way to see other people's profile pages, and to follow updates from them through rss/atom and maybe pubsubhubbub. And on a profile page there may be a human-readable contact form, but this is not currently something that is standardized in any way. The closest thing we have to a standardized "contact me" link is the mailto: URI scheme, but even this is usually handled by some application outside the web browser.

We could propose a link-relation for this, for instance 'post-me-anything', and thus someone's Webfinger record could have a machine-readable link to a URL to which an app can post messages using http POST (it would need CORS headers). But proposing this in itself doesn't help us either, this would only be valuable if a significant number of people would also actually implement it on their Indie Web domains.

I am still thinking about what we can do about this situation, and I'm very curious what other people think about this. If you have any ideas about this, or about any other topic touched upon in this episode, or you would like to help build unhosted web apps around remoteStorage.contacts, then please reply to the mailing list!

Next: Peer-to-peer communication