Update design docs

This commit is contained in:
Anthony Wang 2023-05-10 16:44:18 -04:00
parent 6e8bcca87e
commit 812bbb47d4
Signed by: a
GPG key ID: 42A5B952E6DD8D38

View file

@ -12,23 +12,19 @@ Purely peer-to-peer protocols like IPFS suffer from being way too complex and sl
## Design
Alright, let's solve all those problems listed above! Kela consists of three components, a name resolution system using a DHT, a messaging service, and a storage service.
Alright, let's solve all those problems above with Kela! Kela consists of three components, a name resolution system using a DHT, a storage service, and a messaging service.
In Kela, each user has an ID, which is a public key. Each user is associated with one or more Kela servers, which store that user's data. To find out which servers a user is associated with, you can query the name resolution system. All Kela servers participate in the name resolution system and act as DHT nodes. Each server stores a complete list of all DHT nodes. When a new server joins the DHT, it tries to peer with an existing server in the DHT. Say server `example.com` would like to peer with `test.net`. `example.com` first sends a GET request to `test.net/peer?peer=example.com`. `test.net` replies with its list of DHT nodes. Once `example.com` receives this reply, it adds `test.net` to its list of DHT nodes and attempts to peer with all servers in the reply that it hasn't peered with yet. `test.net` now also tries to peer with the server that just contacted it, in this case `example.com`. Servers periodically go through their list of DHT nodes and remove nodes that are no longer online.
DHT get/set TODO
The DHT stores key-value pairs. The key consists of a user's public key and timestamp (the SHA-256 hash of the public key modulo 600 plus the current Unix time in seconds all divided by 600, rounded down). The value consists of a timestamp (the current Unix time in seconds), a list of servers that the user is associated with, where the first server is their primary server, and a signature. A key-value pair is assigned to the 5 servers with smallest SHA-256 hashes of their domain name greater than the SHA-256 hash of the key. The purpose of the elaborate timestamp in the key is to ensure that the set of servers assigned to a key-value pair rotates every 600 seconds so an attacker must control a very large portion of the DHT to do a denial-of-service attack against a specific key-value pair. When servers join and leave the DHT, the servers that a user is associated with will ensure that that user's key-value pair is assigned to a new server if necessary to ensure that 5 servers store that key-value pair. The DHT supports two operations, get and set. For set operations, the server checks the signature to ensure the validity of the request. When a server receives either of these two operations, it computes the SHA-256 hash of the key and checks if it is supposed to store that key-value pair or not. If it is supposed to store that key-value pair, it performs the operation on that pair. Otherwise, the server will contact in parallel the 5 servers that store this key-value pair. If the operation is a get, the server will look at the 5 replies and return the value with the most recent timestamp. If the operation is a set, and one of the 5 parallel requests fails, the server will remove that offline server from its DHT node list and assign a new server to this key-value pair to replace the offline one. Each server periodically goes through its stored key-value pairs and deletes old ones.
Messaging TODO
The storage service uses a weaker form of primary-backup replication. The storage service supports the three operations get, set and delete, and a user's primary server always handles operations. Get operations are trivial. For a set or delete operation, the primary makes the modification and notifies all the backups about the operation, but responds to the user immediately without ensuring that the backups have performed the operation. All operations are stored in a log, which only stores the operation type and filename of the modified file, but not the contents of the operation. The log and files are persisted to disk. If a backup is offline, the primary maintains a log of all pending operations to be sent to the backup and will keep retrying. If the primary is offline, no progress can be made, but the user can designate any of the backups as the new primary, which also requires a set operation to the DHT to update that user's list of servers. When a backup becomes a primary, it must ensure that any other backups that are ahead of this one rollback their operations to match this backup. To rollback a delete operation, a backup can contact the new primary to get the file.
Storage TODO
The messaging service allows you to send arbitrary HTTP requests to other users. These messages are ephemeral. If the recipient is not online, then the message fails.
Old stuff:
In Kela, applications are users too and have a public key. They are functionally equivalent to automated users.
Let's start where ActivityPub went wrong: identity. ActivityPub uses usernames plus the instance URL (for instance, `billiam@example.com`) as identifiers. This ties your identity strongly to a server, but it's actually not necessary. In Kela, each user is still associated with one (or more) servers, but public keys are identifiers. Each user has a public and private key, and when you want to find a user's server, you query a [DHT](https://en.wikipedia.org/wiki/Distributed_hash_table) formed by all servers. This returns a string containing their server's URL or some other address, signed with that user's private key to prevent tampering.
Your account can be associated with multiple servers, and your data is replicated between them. To prevent sync conflicts, you designate one server as your main server, which is a single source of truth that replicates data to all your other servers. To migrate your account between servers, simply associate your account with a new server, which syncs over all your data to that server, and then deassociate your account with the old server.
Because public keys are annoying to read and type, Kela uses a friendly [petname](http://www.skyhunter.com/marcs/petnames/IntroPetNames.html) [system](https://spritelyproject.org/news/petname-systems.html) so users can assign human-readable names to public keys.
OLD STUFF:
## Applications
@ -38,9 +34,6 @@ In Kela, applications and bots are the same thing. Just like users, applications
You can tunnel HTTP (no TLS needed!) over Kela to host websites. Kela supports tunneling arbitrary data. Even better, Kela can replace your website's authentication system so you don't need passwords anymore!
## Storage
Kela is also a decentralized storage system. You, and websites, can store data on your servers, and it will be replicated between them. Kela also supports access controls so you can give people read or write access to specific files.
## Examples
@ -50,11 +43,6 @@ Kela is also a decentralized storage system. You, and websites, can store data o
Let's try building a simple messaging app with Kela. First, create a folder on Kela's storage system, and share it with the person you are messaging. Then, both people can add messages to the folder. Of course, the low-level details here can be abstracted away by a GUI.
## Issues
Coupling identity closely to public keys makes verifying message integrity easy. However, if your private key leaks, you are screwed.
## History
A [few core ideas](https://social.exozy.me/@ta180m/108201791226634267) for Kela were braindumped by [Anthony Wang](https://a.exozy.me) on the fediverse in April 2022 and written down in a [repository](https://git.exozy.me/a/Kela/src/commit/a2561afe554382ae4ba9fcd9beb276497127dc3c). A few months later, Anthony Wang and [Alek Westover](https://awestover.github.io) developed a very basic [prototype](https://git.exozy.me/a/HackMIT) during [HackMIT 2022](https://hackmit.org). The name "Kela" comes from spelling "Alek" backwards.