spooler: improve queue metadata

This commit is contained in:
Alain Zscheile 2022-11-30 18:30:45 +01:00
parent c54df77319
commit a494dd4517

View file

@ -11,27 +11,15 @@ the public keys have all the same length, and get encoded as urlsafe base64.
## per public keys ... ## per public keys ...
... there can be a bunch of associated files. ... there can be a bunch of associated files.
- {pubkey}.{chunkid}.slcs contains slice listings (* =SLCS *) - {pubkey}.{start}.data contains the actual data (* =DataS; {start} is the urlsafe base64 encoded starting point *)
- {pubkey}.{chunkid}.data contains the actual data (* =DataS *) - {pubkey}.{start}.meta contains the offsets of data blobs (* =MetaS *)
- {pubkey}.lock is a lock file for the public key, which gets used to prevent overlapping writes and GCs... - {pubkey}.lock is a lock file for the public key, which gets used to prevent overlapping writes and GCs...
<proto> <proto>
SLCS ::= [*]SlicePtr MetaS ::= [*]MetaEntry
SlicePtr ::= (* 16 bytes per entry *) MetaEntry ::= offset:u32(big-endian)
(* slice boundaries *)
slice:Slice
(* data boundaries *)
dptr:Pointer
Slice ::=
start:u64(big-endian)
length:u16(big-endian)
Pointer ::=
start:u32(big-endian)
length:u16(big-endian)
DataS ::= DataS ::=
(* @siglen is inferred from the used SigAlgo *) (* @siglen is inferred from the used SigAlgo *)
@ -41,15 +29,13 @@ DataS ::=
... for compaction, the corresponding pubkey gets locked, ... for compaction, the corresponding pubkey gets locked,
- new temporary files are created in the corresponding directory, - new temporary files are created in the corresponding directory,
- the slices get sorted, and for each blob the length gets calculated, - the chunks get sorted, and starting from the newest blob, going reverse
- summed up starting from the newest blob, going reverse, - the size of the slices get calculated,
- until we hit the maximum size per pubkey until a slice is found which hits the maximum size per pubkey
(usually available storage space * 0.8 divided by the amount of known pubkeys) (usually available storage space * 0.8 divided by the amount of known pubkeys)
- then we cut off the remaining, not yet processed blobs - then we cut off the remaining, not yet processed blobs/slice
- and start now from the first kept blob going forward - and start now from the first kept blob going forward
- write all of them to a new data file, and create a corresponding slice listing file - write all of them to a new data file, and create a corresponding metadata file
- note that adjacent slices in the slice listing file also get merged
the compaction algorithm should run roughly once every 15 minutes, but only visit the compaction algorithm should run roughly once every 15 minutes, but only visit
pubkeys to which data has been appended to in that timespan. pubkeys to which data has been appended to in that timespan.