Introduction

This specification defines abstract data objects called ‘value-graph descriptors’ and a URI scheme for naming them.

A value-graph descriptor, theoretically, contains complete set of information about an item. Because it is not possible to define a "complete set of information," canonical representation of a descriptor is impossible to generate.

Instead, descriptor content is represented as a set of its pieces. Representations differ depending on time and contacted database. You may think of them as answers to a precise search query, the query being: What do you know about X? The answer is different depending on who you ask; each respondent provides you with their known pieces.

Value-graph descriptors each belong to a collection. Each collection focus on some topic of interest.

The document also defines a database interface for interacting with the descriptor system. Anything else is outside the scope of this document, including mapping said interface to a communication protocol.

Document conventions

Key words

The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Property shorthand

All words written #likeThis in this document, when the word begins with #, MUST be prefixed with vgd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074 in order to form the full word.

For example, #example is a shorthand notation for vgd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#example.

ABNF rules

An Augumented Backus-Naur Form [RFC5234] is formatted like this:

rule-name = rule-elements

A reference to a rule is written <like-this>.

Value-graph

A value-graph is a set of statements.

A statement is a tuple of three values: the statement’s subject, property and property value.

A value is a graph node and a sequence of [UNICODE] characters. Values are globally unique within the system.

A reference is a value, for which there is global consensus on how to interpret it and there are rules established against value collisions. They are most often IRIs in a normalized form. [RFC3987]

The property MUST be a reference.

Definition of value-graph semantics in detailed in [SEMANTICS].

Value-graph descriptor

A value-graph descriptor is a one-to-one mapping of a piece identifier to a descriptor piece.

A piece identifier is a character sequence with the same syntax as the value of the Message-ID field of Internet Messages. [RFC2822]

A descriptor piece is a pair of a value-graph and a list of signatures.

Signature

A signature is an SDO as defined in [SVO-FRAMEWORK]. [OBJECTS]

Signatures are an assessment by a user that all of the statements contained within the value-graph of a descriptor piece are all correct and true.

The user also states its confidence in the assessment.

Confidence level

A confidence level expresses the signer's confidence in its assessment of the truthfullness of information in the value-graph.

It is a numerical value in the range between 0.001 and 1.000, inclusive, in steps of 0.001.

The associated field #confidence is REQUIRED. Its value is a number.

Credibility

Each statement is assigned a credibility value.

The value is computed from the user's trust in the signer (a user-configurable multiplier value) and the signer's confidence level. It is a (multiplication) product of the two values.

The point of this is to ecourage users to verify information before signing and to set as their goal maintaining a high level of trust from others, because a change in trust impacts all signatures made in the past.

The system deliberately stores no data regarding the user who submitted the value-graph in order to put emphasis on what is being stated and the users' ability to verify the information presented to them.

Expiration time

When all of a piece's signatures are expired, the piece is expired, too. Such [pieces][] are usually useless and are forgotten about.

The field should always be present unless your confidence level is very high.

Signature input

The creation time is represented as <sig-create>:

sig-create = date-time

The <date-time> rule is imported from [RFC3339].

The expiration time is represented as <sig-expire>:

sig-expire = date-time / %s"never"

The keyword never corresponds to an absent field.

The confidence level is represented as <sig-clevel>:

sig-clevel = %s"full" / 2DIGIT "." DIGIT

The value 1.000 is represented by the keyword full. Other values are expressed as percentage without the percentage symbol.

The signature input is the sequence of octets generated as follows:

  1. Let input be an empty sequence of octets.
  2. Let graph be the piece's value-graph.
  3. Encode graph into its binary representation [BINARY] and append the resulting octet sequence to input.
  4. Encode the signer identifier with UTF-8 [RFC3629] and append the resulting octet sequence to input.
  5. Append the creation time as <sig-create> to input.
  6. Append the expiration time as <sig-expire> to input.
  7. Append the confidence level as <sig-clevel> to input.
  8. Encode the SVO-ID with UTF-8 [RFC3629] and append the resulting octet sequence to input.
  9. Return input.

Collections

Every value-graph descriptor is part of a collection.

A collection identifies a topic of interest and group descriptors pertaining to a common subject.

VGCN

Each collection is identified by its Value-Graph Collecion Number (VGCN), which is a user-provided, randomly-generated value.

The maximum value of an VGCN is a variable called its length; the length of an VGCN is expressed in units of 120-bit blocks of 15 octets.

The initial length is 1 block. The length is increased in steps of blocks, i.e. by 120 bits.

At least one bit in the last block MUST be set. An VGCN with all bits cleared is invalid.

Databases keep track of the current length. The length is increased whenever the current amount of known VGCNs reaches 2length-7 or more numbers. In other words, the second block is added after 2120-7 numbers have been allocated, the third after 2240-7 numbers, etc.

The blocks ensure that both base64 and base32 encodings of the binary representation produce strings without any padding. This also leaves one free octet in a 16-octet buffer for use by software, where a last-block marker or the amount of remaining blocks could be stored.

VGCN generation

In order to generate a new VGCN, a database MUST be queried for the current length first; then the necessary amount of blocks filled with random bits is generated.

If the last block contains only cleared bits, the block needs to be generated again.

Descriptor URI

Value-graph descriptors are identified by URIs [RFC3986] with a scheme of vgd.

The URI syntax is defined by the following <vgd-URI> rule.

The <reg-name> and <pchar> rules are imported from [RFC3986].

vgd-URI = "vgd://" vgd-auth vgd-path [ "?" vgd-query ]
;
vgd-auth   = VGCN / reg-name
VGCN       = 1*VGCN-block
VGCN-block = 24( %x32-37 / %x41-5A / %x62-77 )
;
vgd-path   = [ ref-direct ] *( ref-name )
ref-direct = "/!" 1*pchar "!" 1*pchar
ref-name   = "/"  1*pchar
;
vgd-query  = [ qparam *( "&" qparam ) ]
qparam     = 1*pchar "=" 0*pchar

The fragment component is omitted because it is independent from the scheme definition.

Each component is described in detail in following sections.

Scheme

The vgd in the scheme stands for ‘value-graph descriptor’. It is expected that these identifiers are stored in large amounts. A short scheme name was chosen in order to save memory and to make the computation time of URI comparison shorter.

Authority

The naming authority is a collection, canonically referenced by its number.

When a user inputs a URI to a computing system, the user MAY optionally reference a collection via a registered name, which MUST be immediately dereferenced to a number on submission.

By convention, when writing in text and presenting a URI to a user, the VGCN SHOULD be written with capital letters and registered names SHOULD be written with small letters.

VGCN encoding

A textual representation of an VGCN is constructed by encoding it into text with base32. [RFC4648]

Note: One block (15 octets) produces 24 base32 characters.

Registered name

Registered names are only intended for inter-human communication.

The authority is a registered name when it does not match <VGCN>.

This document defines only one domain of registered names: DNS [RFC1035] domain names.

Note: This document may be updated in the future in order to define additional domains of registered names, although it is believed that such a need will never arise. The author believes that no other system for storing human-readable, registered names than DNS is required, because any such system would ultimately have exactly the same problems as those identified with public DNS. Most of them are related not to the database system itself, but are rather problems with the management (such as lack of trust in it) and with the protocol used for transferring the data over a network link. Any human-readable public naming system requires a global public registry, which must be centrally managed in order to solve disputes over names. Even if you put the data on a blockchain or something, you will still end up with a group with control over it sooner or later.

Path

Collection Descriptor

When the path component is empty, the identifier references the Collection Descriptor, which as the name suggests, contains information about the collection.

The ref-direct segment

The ref-direct segment contains a direct reference.

A direct reference begins with the U+0021 EXCLAMATION MARK character (!), followed the name of a namespace, followed by the second U+0021 (!), followed by a name within the namespace.

For example, !example!name is a direct reference to the descriptor named name within the example namespace.

Both the namespace and the name MUST NOT be empty.

Direct references are unique names of value-graph descriptors within a collection.

Namespaces which begin with std. are reserved for standarization.

The ref-name segments

Each ref-name segment contains a named reference to a descriptor, which is relative to the current descriptor.

For example, a/b/c/d has four named references: a is resolved first relative to the current descriptor, then b relative to the result of resolving a, then c relative to the result of resolving b, and lastly d relative to the result of resolving c.

Query

The query component is a serialized list of name-value pairs, which communicate user-supplied parameters to an rd URI resolver.

The serialization algorithm is as follows:

  1. Let output be an empty sequence of UTF-8 [RFC3629] code units.
  2. For each pair in the list being serialized:
    1. Let name be the UTF-8 encoding of the name.
    2. Let value be the UTF-8 encoding of the value.
    3. Replace all U+002B PLUS SIGN characters in value with its percent-encoded equivalent (%2B).
    4. Replace all U+0020 SPACE characters in value with U+002B PLUS SIGN.
    5. If output is not empty, append an U+0026 AMPERSAND character to output.
    6. If name contains any U+0026 AMPERSAND or U+003D EQUALS SIGN characters, percent-encode these characters.
    7. Append name to output.
    8. Append an U+003D EQUALS SIGN character to output.
    9. If value contains any U+0026 AMPERSAND characters, percent-encode these characters.
    10. Append value to output.
  3. Return output.

Normalization

In order to normalize an vgd URI uri, dereference uri, then replace uri with the resulting current descriptor URI, retaining the fragment component of uri.

Resolution

The current descriptor URI is a URI.

The current descriptor is obtained by dereferencing the current descriptor URI to a value-graph descriptor.

In order to dereference an vgd URI:

  1. Parse query parameters.
  2. Resolve the authority to an VGCN.
  3. Set the current descriptor URI to the URI being dereferenced with all named references and query and fragment components removed.
  4. Obtain by direct reference the current descriptor.
  5. Resolve named references, if any.

When there are no more named references to resolve, the current descriptor and the current descriptor URI are returned as the result.

For example, given the URI vgd://example.com/!example!name/n1/n2?ex=ample#ex:

  1. Query parameters from ex=ample are decoded.
  2. Name example.com is dereferenced. Let's say it dereferences to 234567ABCDEFGHIJKLMNOPQR.
  3. Current descriptor URI is set to vgd://234567ABCDEFGHIJKLMNOPQR/!example!name.
  4. Current descriptor is obtained by direct reference: VGCN is 234567ABCDEFGHIJKLMNOPQR, direct reference is !example!name.
  5. Named reference n1 is resolved.
  6. Named reference n2 is resolved.

Domain name resolution

In order to resolve the authotity to an VGCN, first test for an exact match with the <VGCN> rule.

If authority is a match, decode the VGCN from its textual representation. Otherwise, the authority is a registered name.

In order to resolve a domain name to an VGCN on the Internet, issue a DNS standard query with QNAME set to the name prefixed with _vgds. QTYPE set to TXT (16) and QCLASS set to IN (1).

Then, look for a TXT RR in the answer section, RDATA of which matches the following <rd-DNS> rule and extract the value from the first RR that matched.

vgds-DNS   = %s"VGCN " VGCN

Then, decode the textual representation <VGCN>.

Direct-reference resolution

In order to obtain by direct reference a value-graph descriptor, given an VGCN together with a direct reference: connect to target databases and retrieve descriptor data, supplying the VGCN and the direct reference as input.

Please note that this is the sum of the data retrieved from the set of target databases, not from just one.

Named-reference resolution

In order to resolve a named reference seg, look in the current descriptor for statements:

Values are references of type #NamedRef by definition.

Then, for each property value ref, look for a statement:

  • subject of which are ref,
  • property is #name,
  • property value is an exact match to seg.

If there is more than one such statement, the graph is malformed and an error SHOULD be returned.

Then, look for a statement:

  • subject of which is ref,
  • property is #resolvesTo.

The property value is a reference to a value-graph descriptor.

If the reference is a URI of the vgd scheme, then it MUST be in the normalized form.

The current descriptor URI becomes the property value, which is resolved to a value-graph descriptor, which becomes the current descriptor.

The db parameter

The value of the db parameter is a space-separated list of database URIs and registered names.

By default, the list of target databases consists of only vgds.home.arpa. If the authority component was a registered name, the name is also included in the list.

Examples:

Input URI: rd://example.com
Database #1: example.com
Database #2: vgds.home.arpa
Input URI: rd://111111111111111111111111
Database #1: vgds.home.arpa
Input URI: rd://example.com?db=example.org+http://example.com/rdg
Database #1: example.org
Database #2: http://example.com/rdg

The format parameter

The format parameter identifies the format of the representation. The value is either a media type or a URI of the data format.

By default, the value is multipart/prs.fumunokagomeko.vgds.mime. [VGDS-MIME]

The signer-include and signer-exclude parameters

The signer-include and signer-exclude parameters are both a space-separated list of POSIX Extended Regular Expressions or URIs.

Regular expressions begin with the ^ character, otherwise the list element is a signer identifier.

The output MUST contain only those descriptor pieces, signatures of which have a signer that matches: at least one element in the signer-include list and does not match any of the elements in the signer-exclude list.

By default, signer-include matches all and signer-exclude matches none.

Direct-reference namespaces

A collection declares its direct reference namespaces within its Collection Descriptor.

In order to find out what namespaces are available, look in the Collection Descriptor for statements:

  • subject of which is the normalized Collection Descriptor URI,
  • property of which is #hasNamespace,

Property values are references of type #Namespace, which have the following two required properties:

  • #name: a #PathSegment: the name of the namespace.
  • #nsrange: a http://www.w3.org/2001/XMLSchema#pattern: valid syntax of names within the namespace.

The subject SHOULD also have some properties referring to human-readable data with further information on how to interpret the names.

The std.id32 namespace

This namespace contain 32-bit unique descriptor identifiers. Its URI is #std.id32.

All names is this namespace MUST be an exact match to the <ID32> ANBF rule defined as follows:

ID32 = 8( DIGIT / %x61-66 ) ; hexadecimal with small letters

The std.id64 namespace

The std.id64 namespace contain 64-bit unique descriptor identifiers. Its URI is #std.id64.

All names is this namespace MUST be an exact match to the ABNF <ID64> rule defined as follows:

ID64 = 16( DIGIT / %x61-66 ) ; hexadecimal with small letters

The std.hash namespace

The std.hash namespace contain hash values. Its URI is #std.hash.

All names is this namespace MUST be an exact match to the <alg-val> rule defined in [RFC6920].

Database

A database is a service that provides access to its known set of collections.

A database is identified canonically with an URI. The URI can be shortened to just a registered name.

Documents that define implementations of this interface define the syntax of their URIs and how to map a domain name to a URI.

This specification defines only an interface for a database protocol. Specific implementations of the interface are out of scope.

Databases SHOULD only store data they find useful.

The vgds.home.arpa database

This document defines the DNS domain name vgds.home.arpa.. It is a name within a residental home network. [RFC8375]

Availablility of this name is REQUIRED to comply with this document. It is configured locally per site as an opt-in to the RDG system.

Corresponding DNS records MUST locate at least one database.

This name allows one to refer to a collection in URI schemes and other naming systems that require a combination of user and host names.

For example, in an acct URI such as acct:234567ABCDEFGHIJKLMNOPQR@vgds.home.arpa to identify an account associated with a collection; or in a mailto URI such as mailto:234567ABCDEFGHIJKLMNOPQR@vgds.home.arpa to identify a mailbox associated with a collection.

Interface function: current length

Input: none.

Output:

Interface function: collection existence

Input:

Output:

  • Boolean: If true, the collection is known; false if not.

Interface function: retrieval

Input:

Output:

If the direct reference is empty, the request pertains to the Collection Descriptor.

Interface function: submission

Input:

If the direct reference is empty, the request pertains to the Collection Descriptor.

Output is a list of response tuples, each one made of:

The response code is one of:

  • A: signature accepted,
  • R: signature rejected,
  • P: piece rejected,
  • C: piece collision.
  • N: piece not found.

Database behaviour

Databases MUST process the submitted descriptor piece as follows.

A piece with an empty value-graph is interpreted as a piece reference, in case the protocol used does not allow for explicit references.

  1. Let S be the submitted piece.
  2. Let result be a list of response tuples.
  3. For each signature sig in S:
    1. Let rt be a response tuple.
    2. Set database URI of rt to the URI of this database.
    3. Set piece identifier of rt to the piece identifier of S.
    4. Set signer identifier of rt to the signer identifier of sig.
    5. Set response code of rt to N.
    6. Append rt to result.
  4. Let P be a reference to a descriptor piece.
  5. If S is a piece reference:
    1. Find a descriptor by direct reference. If not found, return result.
    2. Find a piece by identifier of S. If not found, return result.
    3. Set P to the found piece.
    Otherwise (if S is not a piece reference):
    1. Find a descriptor by direct reference. If not found, create it.
    2. Find a piece by identifier of S. If found:
      1. Set P to the found piece.
      2. If the set of statements in the value-graph of S is not exactly the same as in the value-graph of P, update all response codes in result to C and return result.
      Otherwise (if piece was not found):
      1. If the value-graph of S does not meet the requirements of this document and of the database, update all response codes in result to P and return result.
      2. Insert S into the descriptor.
      3. Set P to S.
  6. For each signature sig in S:
    1. Verify sig over the value-graph of P.
    2. If sig is valid:
      1. Update the response code of the corresponding tuple in result to A.
      2. Look in P for a signature with the same signer identifier as sig.
      3. If found, replace it with sig. Otherwise, insert sig to the signature list of P.
      Otherwise (if signature is invalid), update the response code of the corresponding tuple in result to R.
  7. Return result.

Note that a new signature may have an expiration time equal to the current time or be in the past, which makes it immediately expired.

Pieces that have expired SHOULD be removed and those that did not SHOULD be kept until they do. It is up to the specific database when and which pieces are kept.

Users have no guarantee that databases will keep storing pieces. It is most desirable that users have their own database, instead of relying on others for storing and serving their pieces.

Security considerations

VGCN length

Because the length of an VGCN is a variable that increases with the passage of time, it can be used to determine a period of time within which a given VGCN was allocated.

There will be numbers "before" and "after" the increase.

Since some of the numbers are intended to identify collections with data about a user identity, there is a risk of discrimination based on this property - i.e. based on the age of a number => the identity => the user's age.

VGCN flood

Databases SHOULD be somehow protected from flood attacks, although this can be said about any kind of public network service.

Databases MAY also forget about collections for which there is no information available within all the networks that they operate in. Such collections could unnecessarily pollute the VGCN space.

Content of a value-graph

If a database stores value-graphs in the submitted format, it ought to remove all comments from the serialization, in order to avoid malicious users from using the database as a data storage by including comments with arbitrary data.

The statements in the graph SHOULD also be processed and validated. This document recommends putting pieces from unrecognized users into a quarantine to be later reviewed by a human being. Data from trusted users could skip the quarantine in general, but it is recommended to also quarantine it once in a while.