diff --git a/WIP/SPOOLER.txt b/WIP/SPOOLER.txt index 3c42b31..eaf6c6f 100644 --- a/WIP/SPOOLER.txt +++ b/WIP/SPOOLER.txt @@ -11,27 +11,15 @@ the public keys have all the same length, and get encoded as urlsafe base64. ## per public keys ... ... there can be a bunch of associated files. -- {pubkey}.{chunkid}.slcs contains slice listings (* =SLCS *) -- {pubkey}.{chunkid}.data contains the actual data (* =DataS *) +- {pubkey}.{start}.data contains the actual data (* =DataS; {start} is the urlsafe base64 encoded starting point *) +- {pubkey}.{start}.meta contains the offsets of data blobs (* =MetaS *) - {pubkey}.lock is a lock file for the public key, which gets used to prevent overlapping writes and GCs... -SLCS ::= [*]SlicePtr +MetaS ::= [*]MetaEntry -SlicePtr ::= (* 16 bytes per entry *) - (* slice boundaries *) - slice:Slice - (* data boundaries *) - dptr:Pointer - -Slice ::= - start:u64(big-endian) - length:u16(big-endian) - -Pointer ::= - start:u32(big-endian) - length:u16(big-endian) +MetaEntry ::= offset:u32(big-endian) DataS ::= (* @siglen is inferred from the used SigAlgo *) @@ -41,15 +29,13 @@ DataS ::= ... for compaction, the corresponding pubkey gets locked, - new temporary files are created in the corresponding directory, -- the slices get sorted, and for each blob the length gets calculated, -- summed up starting from the newest blob, going reverse, -- until we hit the maximum size per pubkey +- the chunks get sorted, and starting from the newest blob, going reverse +- the size of the slices get calculated, + until a slice is found which hits the maximum size per pubkey (usually available storage space * 0.8 divided by the amount of known pubkeys) -- then we cut off the remaining, not yet processed blobs +- then we cut off the remaining, not yet processed blobs/slice - and start now from the first kept blob going forward -- write all of them to a new data file, and create a corresponding slice listing file -- note that adjacent slices in the slice listing file also get merged +- write all of them to a new data file, and create a corresponding metadata file the compaction algorithm should run roughly once every 15 minutes, but only visit pubkeys to which data has been appended to in that timespan. -