initial commit
This commit is contained in:
commit
c54df77319
3 changed files with 171 additions and 0 deletions
44
WIP/ARCHITECTURE.txt
Normal file
44
WIP/ARCHITECTURE.txt
Normal file
|
@ -0,0 +1,44 @@
|
||||||
|
mutex for distribution + packet spooler
|
||||||
|
|
||||||
|
in the elixir version, this uses one process for each of them,
|
||||||
|
but that lead to race conditions and dead locks between these parts...
|
||||||
|
|
||||||
|
ok, what do I want
|
||||||
|
* spooler should use flat files, split into multiple files when they get too big
|
||||||
|
* no TLS, it just makes stuff more complicated, and we also can't trust any nodes in the network,
|
||||||
|
besides those originating/producing messages, which have pubkeys + signatures, which we can manage through an allowlist
|
||||||
|
* it might be the case that there exist clusters of servers which trust each other,
|
||||||
|
but they probably will be connected by a trusted network, so no need for TLS there either.
|
||||||
|
* might make sense to split sockets into read, write parts
|
||||||
|
* use 2 threads per connection
|
||||||
|
* worker threadpool for filtering
|
||||||
|
|
||||||
|
* 16bit message length fields, makes abuse and such much harder (e.g. problematic stuff are mostly images and large binaries,
|
||||||
|
think mostly copyright bs, porn, etc., maybe malware sharing. don't want any of that,
|
||||||
|
even if parts would be nice, because moderating that is too much of a hassle, and really exhausting)
|
||||||
|
max 64KiB messages have another advantage: they make it much harder to completely DDOS the network
|
||||||
|
and exhaust the RAM and disk-space of participating servers.
|
||||||
|
* need multiple channels, due to feedback loop
|
||||||
|
* use non-async rust, we can use `sendfile(2)`! via std::io::copy!
|
||||||
|
* shard messages according to first bytes of signature
|
||||||
|
|
||||||
|
Channels between threads:
|
||||||
|
* initially, from spool -> conn/write
|
||||||
|
* "send :ihave for all entries in the spooler"
|
||||||
|
|
||||||
|
* conn/read -> filterpool
|
||||||
|
* "received message"
|
||||||
|
* conn/read -> conn/write
|
||||||
|
* "got :ihave, send :pull"
|
||||||
|
* filterpool -> conn/write
|
||||||
|
* "send :ihave for latest slice for recently changed pubkey (multiple such messages should be coalesced)"
|
||||||
|
* "distribute message" via sendfile
|
||||||
|
|
||||||
|
no coordination should be necessary for closed read/write channels
|
||||||
|
(the appropriate threads just terminate), but it is necessary to have
|
||||||
|
a way to schedule a re-connection after both terminate. So we need a
|
||||||
|
central list of configured peers, where the associated threads are linked
|
||||||
|
via AtomicUsize + event-listener::Event
|
||||||
|
|
||||||
|
the filterpool acquires a mutex for writing to a shard
|
||||||
|
(allocate an array with 2^16 slots for that)
|
72
WIP/PROTO.txt
Normal file
72
WIP/PROTO.txt
Normal file
|
@ -0,0 +1,72 @@
|
||||||
|
the Elixir version uses an ASN.1 based format, but the support in other
|
||||||
|
languages (besides C) for ASN.1 is surprisingly bad.
|
||||||
|
So I'll resort to a hard-coded format. Again.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
InitMessage ::=
|
||||||
|
capabilities:ShortList<u32(big-endian)>
|
||||||
|
|
||||||
|
(* the InitMessage is the first thing sent on a newly established connection.
|
||||||
|
* both sides are expected to send the capabilities they support
|
||||||
|
* then wait for the other side to send theirs (meaning they don't
|
||||||
|
* block each other during this), and then use just the capabilities
|
||||||
|
* which both peers support *)
|
||||||
|
|
||||||
|
ProtoMessage ::=
|
||||||
|
algorithm:SigAlgo
|
||||||
|
pubkeylen:u16(big-endian)
|
||||||
|
length:u16(big-endian)
|
||||||
|
data:[length]u8=ProtoMsgKind<pubkeylen>
|
||||||
|
|
||||||
|
ProtoMsgKind<pubkeylen> ::=
|
||||||
|
(* xfer *) [0x00] XferSlice<pubkeylen>
|
||||||
|
| (* summary-pull *) [0x01] SummaryList<pubkeylen>
|
||||||
|
| (* summary-ihave *) [0x02] SummaryList<pubkeylen>
|
||||||
|
|
||||||
|
(* after the initial capabilities are sent, both sides are expected to send
|
||||||
|
* summaries for all public keys they know + all available slices for these
|
||||||
|
* pubkeys *)
|
||||||
|
|
||||||
|
ShortList<T> ::= length:u16(big-endian) data:[length]T
|
||||||
|
XSList<T> ::= length:u8 data:[length]T
|
||||||
|
|
||||||
|
(* "raw" summary header overhead = 41 bytes, per 65535 slices *)
|
||||||
|
SummaryList<pubkeylen> ::=
|
||||||
|
data:[*]Slice<pubkeylen> (* uses all remaining space in the packet *)
|
||||||
|
|
||||||
|
Slice<pubkeylen> ::=
|
||||||
|
pubkey:[pubkeylen]u8
|
||||||
|
start:u64
|
||||||
|
length:u16
|
||||||
|
|
||||||
|
XferSlice<pubkeylen> ::=
|
||||||
|
pubkey:[pubkeylen]u8
|
||||||
|
siglen:u16(big-endian)
|
||||||
|
start:u64
|
||||||
|
data:[*]XferBlob<siglen> (* uses all remaining space in the packet *)
|
||||||
|
|
||||||
|
XferBlob<siglen> ::=
|
||||||
|
signature:[siglen]u8 (* signature gets calculated over `blid ++ data` *)
|
||||||
|
ttl:u8
|
||||||
|
data:XferInner
|
||||||
|
|
||||||
|
XferInner ::=
|
||||||
|
(* normal *) [0x00] XferNormal
|
||||||
|
| (* prune *) [0x01] (* empty; NOTE: this packet indicates that
|
||||||
|
* all packets before it should be deleted on all servers;
|
||||||
|
* used for data protection/privacy compliance;
|
||||||
|
* although it is possible to delete such packets after a grace period
|
||||||
|
* (e.g. 1 year) during which every peer should've had a chance to retrieve
|
||||||
|
* it, doing that is discouraged *)
|
||||||
|
|
||||||
|
XferNormal ::=
|
||||||
|
attrs:ShortList<KeyValuePair>
|
||||||
|
data:ShortList<u8>
|
||||||
|
|
||||||
|
KeyValuePair ::=
|
||||||
|
key:XSList<u8>
|
||||||
|
value:ShortList<u8>
|
||||||
|
|
||||||
|
SigAlgo ::=
|
||||||
|
(* ed25519 *) [0x65, 0x64, 0xff, 0x13]
|
55
WIP/SPOOLER.txt
Normal file
55
WIP/SPOOLER.txt
Normal file
|
@ -0,0 +1,55 @@
|
||||||
|
spool file organization:
|
||||||
|
|
||||||
|
the directory tree looks like this:
|
||||||
|
{SigAlgo:04x}/{:02x}/{:02x}/...
|
||||||
|
the first level is the SigAlgo identifier of the signature algorithm in use.
|
||||||
|
the second and third levels are the first 2 bytes of the public key of a party,
|
||||||
|
encoded as hexadecimal.
|
||||||
|
|
||||||
|
the public keys have all the same length, and get encoded as urlsafe base64.
|
||||||
|
|
||||||
|
## per public keys ...
|
||||||
|
|
||||||
|
... there can be a bunch of associated files.
|
||||||
|
- {pubkey}.{chunkid}.slcs contains slice listings (* =SLCS *)
|
||||||
|
- {pubkey}.{chunkid}.data contains the actual data (* =DataS *)
|
||||||
|
- {pubkey}.lock is a lock file for the public key, which gets used to prevent overlapping writes and GCs...
|
||||||
|
|
||||||
|
<proto>
|
||||||
|
|
||||||
|
SLCS ::= [*]SlicePtr
|
||||||
|
|
||||||
|
SlicePtr ::= (* 16 bytes per entry *)
|
||||||
|
(* slice boundaries *)
|
||||||
|
slice:Slice
|
||||||
|
(* data boundaries *)
|
||||||
|
dptr:Pointer
|
||||||
|
|
||||||
|
Slice ::=
|
||||||
|
start:u64(big-endian)
|
||||||
|
length:u16(big-endian)
|
||||||
|
|
||||||
|
Pointer ::=
|
||||||
|
start:u32(big-endian)
|
||||||
|
length:u16(big-endian)
|
||||||
|
|
||||||
|
DataS ::=
|
||||||
|
(* @siglen is inferred from the used SigAlgo *)
|
||||||
|
[*]XferBlob<@siglen>
|
||||||
|
|
||||||
|
</proto>
|
||||||
|
|
||||||
|
... for compaction, the corresponding pubkey gets locked,
|
||||||
|
- new temporary files are created in the corresponding directory,
|
||||||
|
- the slices get sorted, and for each blob the length gets calculated,
|
||||||
|
- summed up starting from the newest blob, going reverse,
|
||||||
|
- until we hit the maximum size per pubkey
|
||||||
|
(usually available storage space * 0.8 divided by the amount of known pubkeys)
|
||||||
|
- then we cut off the remaining, not yet processed blobs
|
||||||
|
- and start now from the first kept blob going forward
|
||||||
|
- write all of them to a new data file, and create a corresponding slice listing file
|
||||||
|
- note that adjacent slices in the slice listing file also get merged
|
||||||
|
|
||||||
|
the compaction algorithm should run roughly once every 15 minutes, but only visit
|
||||||
|
pubkeys to which data has been appended to in that timespan.
|
||||||
|
|
Loading…
Reference in a new issue