Introduction
This document defines a binary representation of a value-graph with strict encoding rules. The output octet sequence is always the same for a given graph.
Key words
The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
Representation
The value-graph representation encodes two distinct elements, in the order given, as a sequence of octets:
- value block,
- list of statements (the graph).
Integers
All integers and unsigned and are encoded as follows:
- Let output be an empty sequence of octets.
- Encode the integer into its binary representation using the smallest amount of bits; let bits be this sequence.
- While length of bits is not a multiple of 7 bits:
- Prepend one cleared bit to bits.
- While bits has more than 7 bits:
- Pop the first 7 bits from bits and store these 7 bits as the least significant 7 bits of an octet octet.
- Set the most significant bit of octet.
- Append octet to output.
- Pop the first 7 bits from bits and store these 7 bits as the least significant 7 bits of an octet octet.
- Clear the most significant bit of octet.
- Append octet to output.
In other words, 0b1001010010 (10 bits) is extended to 0b00001001010010 (14 bits) and then encoded like this:
- ------- - ------- |1|0000100| |0|1010010| - ------- - ------- octet #1 octet #2
The most-significant bit indicates whether this is the last octet, the remaining 7 bits contains bits of the integer.
Value block
A value block begins with an integer: the length of the whole block, in octets.
After the integer follows a list of structures with the following fields:
- offset: integer.
- length: integer.
- chars: Sequence of characters encoded in UTF-8. [RFC2277]
The length field contains the amount of code points in the following char field.
The offset field contains an offset, in octets, from the beginning of the block, including the block length, pointing to a structure with a prefix for the current one. The value zero means there is no prefix.
In order to reconstruct a value from a structure: copy chars to a buffer; then recusively follow the offset pointer to another structure and prepend its chars to the buffer until there is no prefix.
List of statements
A list of statements begins with an integer: the amount of distinct subjects.
Then for each subject, in ascending order by code point, is the offset into the value block of the corresponding value of the subject, followed by a [list of properties][] of the subject.
A list of properties begins with an integer: the amount of distinct (subject, property) pairs.
Then for each such property, in ascending order by code point, is the offset into the value block of the corresponding value of the property, followed by a list of property values of the property.
A list of property values begins with an integer: the amount of distinct (subject, property, value) tuples, followed by an offset for each property value.
In order to reconstruct the graph: for each subject:
read the next subject into subject,
then for each property:
read the next property into property,
then for each property value:
read the next property value into value and emit a (subject, property, value) tuple.