spooler: improve queue metadata

This commit is contained in:
Alain Zscheile 2022-11-30 18:30:45 +01:00
parent c54df77319
commit a494dd4517

View file

@ -11,27 +11,15 @@ the public keys have all the same length, and get encoded as urlsafe base64.
## per public keys ...
... there can be a bunch of associated files.
- {pubkey}.{chunkid}.slcs contains slice listings (* =SLCS *)
- {pubkey}.{chunkid}.data contains the actual data (* =DataS *)
- {pubkey}.{start}.data contains the actual data (* =DataS; {start} is the urlsafe base64 encoded starting point *)
- {pubkey}.{start}.meta contains the offsets of data blobs (* =MetaS *)
- {pubkey}.lock is a lock file for the public key, which gets used to prevent overlapping writes and GCs...
<proto>
SLCS ::= [*]SlicePtr
MetaS ::= [*]MetaEntry
SlicePtr ::= (* 16 bytes per entry *)
(* slice boundaries *)
slice:Slice
(* data boundaries *)
dptr:Pointer
Slice ::=
start:u64(big-endian)
length:u16(big-endian)
Pointer ::=
start:u32(big-endian)
length:u16(big-endian)
MetaEntry ::= offset:u32(big-endian)
DataS ::=
(* @siglen is inferred from the used SigAlgo *)
@ -41,15 +29,13 @@ DataS ::=
... for compaction, the corresponding pubkey gets locked,
- new temporary files are created in the corresponding directory,
- the slices get sorted, and for each blob the length gets calculated,
- summed up starting from the newest blob, going reverse,
- until we hit the maximum size per pubkey
- the chunks get sorted, and starting from the newest blob, going reverse
- the size of the slices get calculated,
until a slice is found which hits the maximum size per pubkey
(usually available storage space * 0.8 divided by the amount of known pubkeys)
- then we cut off the remaining, not yet processed blobs
- then we cut off the remaining, not yet processed blobs/slice
- and start now from the first kept blob going forward
- write all of them to a new data file, and create a corresponding slice listing file
- note that adjacent slices in the slice listing file also get merged
- write all of them to a new data file, and create a corresponding metadata file
the compaction algorithm should run roughly once every 15 minutes, but only visit
pubkeys to which data has been appended to in that timespan.