initial commit

2022-11-30 12:55:40 +01:00 · 2022-11-30 12:55:40 +01:00 · c54df77319
commit c54df77319
3 changed files with 171 additions and 0 deletions
--- a/WIP/ARCHITECTURE.txt
+++ b/WIP/ARCHITECTURE.txt
@ -0,0 +1,44 @@
 mutex for distribution + packet spooler
 in the elixir version, this uses one process for each of them,
 but that lead to race conditions and dead locks between these parts...
 ok, what do I want
 * spooler should use flat files, split into multiple files when they get too big
 * no TLS, it just makes stuff more complicated, and we also can't trust any nodes in the network,
  besides those originating/producing messages, which have pubkeys + signatures, which we can manage through an allowlist
 * it might be the case that there exist clusters of servers which trust each other,
  but they probably will be connected by a trusted network, so no need for TLS there either.
 * might make sense to split sockets into read, write parts
 * use 2 threads per connection
 * worker threadpool for filtering
 * 16bit message length fields, makes abuse and such much harder (e.g. problematic stuff are mostly images and large binaries,
  think mostly copyright bs, porn, etc., maybe malware sharing. don't want any of that,
  even if parts would be nice, because moderating that is too much of a hassle, and really exhausting)
  max 64KiB messages have another advantage: they make it much harder to completely DDOS the network
  and exhaust the RAM and disk-space of participating servers.
 * need multiple channels, due to feedback loop
 * use non-async rust, we can use `sendfile(2)`! via std::io::copy!
 * shard messages according to first bytes of signature
 Channels between threads:
 * initially, from spool -> conn/write
  * "send :ihave for all entries in the spooler"
 * conn/read -> filterpool
  * "received message"
 * conn/read -> conn/write
  * "got :ihave, send :pull"
 * filterpool -> conn/write
  * "send :ihave for latest slice for recently changed pubkey (multiple such messages should be coalesced)"
  * "distribute message" via sendfile
 no coordination should be necessary for closed read/write channels
 (the appropriate threads just terminate), but it is necessary to have
 a way to schedule a re-connection after both terminate. So we need a
 central list of configured peers, where the associated threads are linked
 via AtomicUsize + event-listener::Event
 the filterpool acquires a mutex for writing to a shard
 (allocate an array with 2^16 slots for that)
--- a/WIP/PROTO.txt
+++ b/WIP/PROTO.txt
@ -0,0 +1,72 @@
 the Elixir version uses an ASN.1 based format, but the support in other
 languages (besides C) for ASN.1 is surprisingly bad.
 So I'll resort to a hard-coded format. Again.
 ----
 InitMessage ::=
    capabilities:ShortList<u32(big-endian)>
 (* the InitMessage is the first thing sent on a newly established connection.
 * both sides are expected to send the capabilities they support
 * then wait for the other side to send theirs (meaning they don't
 * block each other during this), and then use just the capabilities
 * which both peers support *)
 ProtoMessage ::=
    algorithm:SigAlgo
    pubkeylen:u16(big-endian)
    length:u16(big-endian)
    data:[length]u8=ProtoMsgKind<pubkeylen>
 ProtoMsgKind<pubkeylen> ::=
    (* xfer           *) [0x00] XferSlice<pubkeylen>
  | (* summary-pull   *) [0x01] SummaryList<pubkeylen>
  | (* summary-ihave  *) [0x02] SummaryList<pubkeylen>
 (* after the initial capabilities are sent, both sides are expected to send
 * summaries for all public keys they know + all available slices for these
 * pubkeys                                                                  *)
 ShortList<T> ::= length:u16(big-endian) data:[length]T
 XSList<T> ::= length:u8 data:[length]T
 (* "raw" summary header overhead = 41 bytes, per 65535 slices *)
 SummaryList<pubkeylen> ::=
    data:[*]Slice<pubkeylen> (* uses all remaining space in the packet *)
 Slice<pubkeylen> ::=
    pubkey:[pubkeylen]u8
    start:u64
    length:u16
 XferSlice<pubkeylen> ::=
    pubkey:[pubkeylen]u8
    siglen:u16(big-endian)
    start:u64
    data:[*]XferBlob<siglen> (* uses all remaining space in the packet *)
 XferBlob<siglen> ::=
    signature:[siglen]u8 (* signature gets calculated over `blid ++ data` *)
    ttl:u8
    data:XferInner
 XferInner ::=
    (* normal *) [0x00] XferNormal
  | (* prune  *) [0x01] (* empty; NOTE: this packet indicates that
     * all packets before it should be deleted on all servers;
     * used for data protection/privacy compliance;
     * although it is possible to delete such packets after a grace period
     * (e.g. 1 year) during which every peer should've had a chance to retrieve
     * it, doing that is discouraged *)
 XferNormal ::=
    attrs:ShortList<KeyValuePair>
    data:ShortList<u8>
 KeyValuePair ::=
    key:XSList<u8>
    value:ShortList<u8>
 SigAlgo ::=
    (* ed25519   *) [0x65, 0x64, 0xff, 0x13]
--- a/WIP/SPOOLER.txt
+++ b/WIP/SPOOLER.txt
@ -0,0 +1,55 @@
 spool file organization:
 the directory tree looks like this:
  {SigAlgo:04x}/{:02x}/{:02x}/...
 the first level is the SigAlgo identifier of the signature algorithm in use.
 the second and third levels are the first 2 bytes of the public key of a party,
 encoded as hexadecimal.
 the public keys have all the same length, and get encoded as urlsafe base64.
 ## per public keys ...
 ... there can be a bunch of associated files.
 - {pubkey}.{chunkid}.slcs contains slice listings (* =SLCS *)
 - {pubkey}.{chunkid}.data contains the actual data (* =DataS *)
 - {pubkey}.lock is a lock file for the public key, which gets used to prevent overlapping writes and GCs...
 <proto>
 SLCS ::= [*]SlicePtr
 SlicePtr ::= (* 16 bytes per entry *)
    (* slice boundaries *)
    slice:Slice
    (* data boundaries *)
    dptr:Pointer
 Slice ::=
    start:u64(big-endian)
    length:u16(big-endian)
 Pointer ::=
    start:u32(big-endian)
    length:u16(big-endian)
 DataS ::=
    (* @siglen is inferred from the used SigAlgo *)
    [*]XferBlob<@siglen>
 </proto>
 ... for compaction, the corresponding pubkey gets locked,
 - new temporary files are created in the corresponding directory,
 - the slices get sorted, and for each blob the length gets calculated,
 - summed up starting from the newest blob, going reverse,
 - until we hit the maximum size per pubkey
  (usually available storage space * 0.8 divided by the amount of known pubkeys)
 - then we cut off the remaining, not yet processed blobs
 - and start now from the first kept blob going forward
 - write all of them to a new data file, and create a corresponding slice listing file
 - note that adjacent slices in the slice listing file also get merged
 the compaction algorithm should run roughly once every 15 minutes, but only visit
 pubkeys to which data has been appended to in that timespan.