spooler: improve queue metadata

2022-11-30 18:30:45 +01:00 · 2022-11-30 18:30:45 +01:00 · a494dd4517
commit a494dd4517
parent c54df77319
1 changed files with 9 additions and 23 deletions
--- a/WIP/SPOOLER.txt
+++ b/WIP/SPOOLER.txt
@ -11,27 +11,15 @@ the public keys have all the same length, and get encoded as urlsafe base64.
 ## per public keys ...

 ... there can be a bunch of associated files.
- {pubkey}.{chunkid}.slcs contains slice listings (* =SLCS *)
- {pubkey}.{chunkid}.data contains the actual data (* =DataS *)
+- {pubkey}.{start}.data contains the actual data (* =DataS; {start} is the urlsafe base64 encoded starting point *)
+- {pubkey}.{start}.meta contains the offsets of data blobs (* =MetaS *)
 - {pubkey}.lock is a lock file for the public key, which gets used to prevent overlapping writes and GCs...

 <proto>

-SLCS ::= [*]SlicePtr
+MetaS ::= [*]MetaEntry

-SlicePtr ::= (* 16 bytes per entry *)
-    (* slice boundaries *)
-    slice:Slice
-    (* data boundaries *)
-    dptr:Pointer
-
-Slice ::=
-    start:u64(big-endian)
-    length:u16(big-endian)
-
-Pointer ::=
-    start:u32(big-endian)
-    length:u16(big-endian)
+MetaEntry ::= offset:u32(big-endian)

 DataS ::=
    (* @siglen is inferred from the used SigAlgo *)
@ -41,15 +29,13 @@ DataS ::=

 ... for compaction, the corresponding pubkey gets locked,
 - new temporary files are created in the corresponding directory,
- the slices get sorted, and for each blob the length gets calculated,
- summed up starting from the newest blob, going reverse,
- until we hit the maximum size per pubkey
+- the chunks get sorted, and starting from the newest blob, going reverse
+- the size of the slices get calculated,
+  until a slice is found which hits the maximum size per pubkey
  (usually available storage space * 0.8 divided by the amount of known pubkeys)
- then we cut off the remaining, not yet processed blobs
+- then we cut off the remaining, not yet processed blobs/slice
 - and start now from the first kept blob going forward
- write all of them to a new data file, and create a corresponding slice listing file
- note that adjacent slices in the slice listing file also get merged
+- write all of them to a new data file, and create a corresponding metadata file

 the compaction algorithm should run roughly once every 15 minutes, but only visit
 pubkeys to which data has been appended to in that timespan.
-