initial commit

This commit is contained in:
Alain Zscheile 2023-01-07 17:59:35 +01:00
commit 66d54b0f2a

87
docs/index.gmi Normal file
View file

@ -0,0 +1,87 @@
# Goals
"yglnk" is a document linking language. In contrast to a classic linker script or such, it is also used to implement glue for bidirectional hyperlinks.
* contains multiple sections, and supports pointers between sections
* sections are 16-byte aligned
* needs to support linking of content files to style files
* needs to itself support links to other such yglnk files
* binary for compactness
* links can contain additional metadata (e.g. name, and arbitrary key-value pairs)
* strings are prefixed by their length to avoid costly separator parsing
This format supersedes "gardglue"
# Serialization
All integers are encoded as big-endian.
## Header
```
header := magic[4b] generator[4b] type[4b] version[4b] tstr_loc[4b] reserved[12b]
```
The file magic at offset 0 is "YgLn" = 0x5967'4c63. The header itself forms a section, and contains the types of the top-level sections. After the first 32 bytes follows the primary linear table.
The top level linear table has `entsize = 2`.
## Types
```
0x00000000 T_PLAIN_TEXT (UTF-8)
0x00000001 T_NESTED_TEXT
0x00000010 T_LINEAR_PLAIN_TAB
0x00000020 T_HASH_PLAIN_TAB
0x00000021 T_HASH_LINK_TAB
0x00000030 T_2DHC_PLAIN_TAB
0x00000031 T_2DHC_LINK_TAB
```
## External link table (0x21, T_HASH_LINK_TAB; 0x31, T_2DHC_LINK_TAB)
A hash table or "hilbert curve" table (see corresponding sections). Used to reference external content and facilitate its lookup.
## Linear Tables
A simple list of entries. An entry containing all-zeros indicates the end of the list. The actual location decoded resides at `location << 4`, because names and sections are aligned to 16 byte boundaries. `rest` contains potentially additional data, also aligned to 16 bytes. `entsize * entcount << 4` is the length of each entry.
```
list entry := name[4b] type[4b] location[4b] entsize[2b] entcount[2b] rest[*]
```
## Hash tables
A hash table, using 64bit-xxHash. chains are traversed in order, the last bit of the first 8b of a chain entry is used to indicate if another entry follows (0 = last entry), the rest contains the hash.
```
ht header := strtab_link[8b] nbuckets[4b] nvals[4b] nblf[4b] blshift[4b]
ht body := bloom[4b * nlbf] buckets[4b * nbuckets] chains[32b * nvals]
chain entry := hash[8b] name[4b] type[4b] value[16b]
```
## 2D "hilbert curve" tables
An X-Y-indexable table, similar to the previous tables, stores "key-value" pairs, but the key this time is a 2 byte value, where the first byte is an x coordinate and the second byte is an y coordinate. The purpose is to increase locality between adjacent x values and adjacent y values. Only the first `xybits` are honored of each x and y value. the size of the table is then `16 << (2 * xybits)`.
```
2dt header := xybits[1b] reserved[15b]
2dt entry := meta[8b] location[8b]
```
## Nested Text
Strings are marked with their length in order to avoid nasty delimiter parsing and such. Text should be easily nestable. A nesting contains only other nestings or nt-strings.
```
nt string := typ[1b]=0x00 length[2b] data[1b * length] (utf-8)
nt nesting := typ[1b]!=0 subtype[2b] length[4b] elems[1b * length]
```
### Known nt nesting types
```
0x00---- NTT_STRING
0x010000 NTT_DIV
0x010001 NTT_GROUP
0x010002 NTT_HEADER
```