RDG: Framework

Introduction

This specification defines abstract data objects called ‘resource descriptors’ and a URI scheme for naming them.

A descriptor is abstract in the sense that there is no authority that decides what the "correct" representation of it is. It is neither a specific object nor a network location. Representations differ depending on time and contacted database. You may think of them as answers to a precise search query, the query being: What do you know about X? The answer is different depending on who you ask.

Resource descriptors are stored in collections (sub-graphs), each collection focusing on some topic of interest.

The document also defines an interface for interacting with descriptors and provides guidance regarding its intended use. Anything else is outside the scope of this document, including mapping said interface to a communication protocol.

Key words

The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Resource descriptor

A resource descriptor is a one-to-one mapping of a piece indentifier to a descriptor piece.

A piece identifier is a character sequence with the same syntax as the value of the Message-ID field of Internet Messages. [RFC2822]

A descriptor piece is a pair of an RDF graph and a one-to-one mapping of a signer identifier to a signature.

An RDF graph is a non-empty set of RDF statements. [RDF]

A signer identifier is a URI. [RFC2822] The RECOMMENDED scheme is acct. [RFC7565]

A signature is a tuple of:

creation time,
expiration time,
confidence level,
signature verification object identifier,
signature data (octet sequence).

Signatures are described in more detail in the next section.

Graph requirements

All rd URIs in a graph MUST be in their normalized form.
There MUST NOT be any data (or similar) URIs. [RFC2397]

Descriptors are meant to store descriptions of other resources. Graphs that contains statements that, for example, establish that file F has content C, where C lists all bytes of the file must be rejected. While such description is technically correct, resource descriptors are not meant to store this kind of data.

Signatures

Signatures are an assessment by a user that all of the statements contained within the RDF graph of a piece are all correct and true. The user also states its confidence in the assessment.

Signature verification object

A signature verification object (SVO) is a one-to-one mapping of a character sequence to some data. Each such mapping is called a field named by the character sequence.

The REQUIRED field id maps to a <name>, referred to as SVO-ID, which uniquely identifies the object in the scope of other such objects associated with the same signer identifier.

name = 1*63VCHAR

Protocols for locating and retrieval of these objects and their storage format are out of scope of this document.

The REQUIRED field algorithm maps to a <name>, which uniquely identifies an algorithm for verifying a signature. The algorithms utilize SVO fields not defined by this document. Their specification is out of scope of this document.

Creation time

A creation time records the date and time when a signature was generated.

Creation time is represented as <sig-create>:

sig-create = date-time

The <date-time> rule is imported from [RFC3339].

Signature verification objects MUST contain fields since and until, each being a <date-time> character string.

If the current date and time is before since or after until, the object is obsolete; otherwise it is current.

A signature is valid if and only if the date and time of its creation time is after or equal to since and before or equal to until.

Expiration time

An expiration time is either a specific date and time or a keyword.

Expiration time is represented as <sig-expire>:

sig-expire = date-time / sig-expkey
sig-expkey = %s"never"

A signature is expired if the current date and time is equal to or after the date and time represented by <sig-expire>.

Once all of its signatures expire, a piece is expired, too. Such pieces are usually useless and are forgotten about.

The only defined keyword by this document is never, indicating that the signature never expires. This value should be used only when your confidence is very high.

Confidence level

A confidence level expresses the signer's confidence in its assessment of the truthfullness of information in the graph. It is a numerical value in the range between 0.001 and 1.000, inclusive, in steps of 0.001.

Confidence is represented as <sig-clevel>:

sig-clevel = %s"full" / 2DIGIT "." DIGIT

The value 1.000 is represented by the keyword full. Other values are expressed as percentage without the percentage symbol.

RDF statements are assigned credibility values. The value is computed from the user's trust in the signer (a user-configurable multiplier value) and the signer's confidence. It is a (multiplication) product of the two values.

The point of this is to ecourage users to verify information before signing and to set as their goal maintaining a high level of trust from others, because a change in trust impacts all signatures made in the past.

The system deliberately stores no data regarding the user who submitted the information in order to put emphasis on what is being stated and the users' ability to verify the information presented to them. Users that only produce or spread information are not trustworthy.

Signature data

A signature data is a sequence of octets. It is the result of applying some signature algorthithm on the sequence of octets generated as follows:

Let input be an empty sequence of octets.
Let graph be the piece's RDF graph.
Append the signer identifier as <URI> to input.
Append the creation time as <sig-create> to input.
Append the expiration time as <sig-expire> to input.
Append the confidence level as <sig-clevel> to input.
Append the SVO-ID as <SVO-ID> to input.
Encode graph into its application/prs.fumunokagomeko.rdg-graph representation [RDG-GRAPH] and append the resulting sequence of octets to input.
Return input.

This algorithm ensures that signatures do not depend on the representation of a descriptor (of RDF graphs in particular).

Signature verification object rotation

Every signer keeps an octet sequence list for every current signature verification object.

Once a signature is generated, its creation time as %lt;date-time> and then its signature data are appended to list.

Once the signature verification object becomes obsolete, an octet sequence data, which is verified by another, current SVO, is generated, which signs list.

A tuple consisting of the following fields is published under a field named signatures in the obsolete SVO:

list: the octet sequence list
data: the signature data
svoid: SVO-ID associated with data
time: creation time of data

This prevents a malicious actor from generating counterfeit signatures verified with obsolete SVOs in the future.

For obsolete SVOs, a signature is valid if and only if its creation time and signature data exist within list.

URI comparision

When comparing URIs according to this framework:

Always resolve to the full URI when encountering a URI reference.
If the scheme is rd, normalization is REQUIRED; otherwise, normalization is RECOMMENDED.

Database

Resource descriptors are stored by databases.

This specification defines only an interface for a database protocol. Specific implementations of the interface are out of scope.

Collections

Databases store resource descriptors within collections. A collection identifies a topic of interest and group descriptors pertaining to a common subject. Collections MUST be distributed and decentralized.

Databases and users can subscribe to a collection in order to synchronize their descriptors in the scope of a single collection. Users and databases SHOULD only store descriptors they find useful.

Each such collection in a database is identified by its Resource-descriptor Collecion Number (RDCN), which is a user-provided, randomly-generated unsigned integer.

The maximum value of an RDCN is a variable called its length The length of an RDCN is expressed in units of 120-bit blocks.

The initial length is 1 block. The length is increased in steps of blocks, i.e. by 120 bits.

At least one bit of the last block MUST be set. An RDCN with all bits cleared is invalid.

Databases keep track of the current RDCN length. The length is increased whenever the current amount of known RDCNs reaches 2^length-7 or more numbers. In other words, the second block is added after 2^120-7 numbers have been allocated, the third after 2^240-7 numbers, etc.

The blocks ensure that both base64 and base32 encodings of the binary representation produce strings without any padding. This also leaves one free octet in a 16-octet buffer for use by software, where a last-block marker or the amount of remaining blocks could be stored.

Generation of RDCNs

Length of new RDCNs MUST be equal to the current length.

In order to generate a new RDCN, a user MUST first contact a database in order to determine its current RDCN length. It then generates necessary amount of blocks filled with random bits.

Descriptor URI

Resource descriptors are identified by the rd URI scheme. Uniform Resource Identifiers are defined in [RFC3986].

Its syntax is defined by the following rd-URI ABNF rule. [RFC5234] The reg-name, pchar and fragment rules are imported from [RFC3986].

rd-URI   = "rd://" rd-auth rd-path [ "?" rd-query ] [ "#" fragment ]
;
rd-auth    = RDCN / reg-name
RDCN       = 1*RDCN-block
RDCN-block = 24( %x32-37 / %x41-5A / %x62-77 )
;
rd-path    = [ ref-direct ] *( ref-name )
ref-direct = "/!" 1*pchar "!" 1*pchar
ref-name   = "/"  1*pchar
;
rd-query   = [ qparam *( "&" qparam ) ]
qparam     = 1*pchar "=" 0*pchar

Each component is described in detail in following sections.

Scheme

The rd in the scheme stands for ‘resource descriptor’. It is expected that these identifiers are stored in large amounts. A two-letter abbreviation was chosen in order to save memory and to make the computation time of URI comparisions shorter.

Authority

The naming authority is a collection, canonically referenced by its number.

When a user inputs a URI to a computing system, the user MAY optionally reference a collection via a registered name, which MUST be immediately dereferenced to a number on submission.

By convention, when writing in text and presenting a URI to a user, the RDCN SHOULD be written with capital letters and registered names SHOULD be written with small letters.

RDCN encoding

A textual representation of an RDCN is constructed by representing the number as a sequence of octets in ascending order of octet significance and then encoding it into text with base32. [RFC4648]

Note: One block (15 octets) produces 24 base32 characters.

Trailing empty blocks (those with all bits cleared) MUST be removed.

For example, a 2-block RDCN 234567ABCDEFGHIJKLMNOPQRAAAAAAAAAAAAAAAAAAAAAAAA MUST be encoded as 234567ABCDEFGHIJKLMNOPQR.

Registered name

Registered names are intended for communicating an RDCN between humans, by way of speech or human-readable text, because RDCNs, being large numbers, are otherwise difficult for humans to transmit.

The authority is a registered name when it does not match the RDCN rule.

This document defines only one domain of registered names: DNS [RFC1035] domain names.

Note: This document may be updated in the future in order to define additional domains of registered names, although it is believed that such a need will never arise. The author believes that no other system for storing human-readable, registered names than DNS is required, because any such system would ultimately have exactly the same problems as those identified with public DNS. Most of them are related not to the database system itself, but are rather problems with the management (such as lack of trust in it) and with the protocol used for transferring the data over a network link. Any human-readable public naming system requires a global public registry, which must be centrally managed in order to solve disputes over names. Even if you put the data on a blockchain or something, you will still end up with a group with control over it sooner or later.

The `rd-graph.home.arpa` database

This document defines the DNS domain name rd-graph.home.arpa.. It is a name within a residental home network. [RFC8375]

Availablility of this name is REQUIRED to comply with this document. It is configured locally per site as an opt-in to the RDG system.

Corresponding DNS records MUST locate at least one database.

This name MAY be used, for example, in an acct URI such as acct:234567ABCDEFGHIJKLMNOPQR@rd-graph.home.arpa to identify an account associated with a collection; or in a mailto URI such as mailto:234567ABCDEFGHIJKLMNOPQR@rd-graph.home.arpa to identify a mailbox associated with a collection.

In other words, this name allows one to refer to a collection in URI schemes and other naming systems that require a host name.

Path

Collection Descriptor

When the path component of an URI is empty, the identifier references the Collection Descriptor, which as the name suggests, contains information about the collection.

The `ref-direct` segment

The ref-direct segment contains a direct reference.

A direct reference begins with the U+0021 EXCLAMATION MARK character (!), followed the name of a namespace, followed by the second U+0021 (!), followed by a name within the namespace.

For example, !example!name is a direct reference to the descriptor named name within the example namespace.

Both the namespace and the name MUST NOT be empty.

Namespaces which begin with the character sequence std. (U+0073 U+0074 U+0064 U+002E) are reserved for standarization. Some of these namespaces are defined in this document.

Direct references are unique names of descriptors in a collection.

The `ref-name` segment

Each ref-name segment contains a named reference to a descriptor, which is relative to the current descriptor.

For example, /a/b/c/d has four named references: a is resolved first relative to the current descriptor, then b relative to the result of resolving a, then c relative to the result of resolving b, and lastly d relative to the result of resolving c.

Query

The query component is a serialized list of name-value pairs, which communicate user-supplied parameters to an rd URI resolver.

The serialization algorithm is as follows:

Let output be an empty sequence of UTF-8 [RFC3629] code units.
For each pair in the list being serialized:
1. Let name be the UTF-8 encoding of the name.
2. Let value be the UTF-8 encoding of the value.
3. Replace all U+002B PLUS SIGN characters in value with its percent-encoded equivalent (%2B).
4. Replace all U+0020 SPACE characters in value with U+002B PLUS SIGN.
5. If output is not empty, append an U+0026 AMPERSAND character to output.
6. If name contains any U+0026 AMPERSAND or U+003D EQUALS SIGN characters, percent-encode these characters.
7. Append name to output.
8. Append an U+003D EQUALS SIGN character to output.
9. If value contains any U+0026 AMPERSAND characters, percent-encode these characters.
10. Append value to output.
Return output.

Normalization

In order to normalize an rd URI uri, dereference uri, then replace uri with the resulting current descriptor URI, retaining the fragment component of uri.

Resolution

The current descriptor URI is a URI.

The current descriptor is obtained by dereferencing the current descriptor URI to a resource descriptor.

In order to dereference an rd URI:

Parse query parameters.
Resolve the authority to an RDCN.
Set the current descriptor URI to the URI being dereferenced with all named references and query and fragment components removed.
Obtain by direct reference the current descriptor.
Resolve named references, if any.

When there are no more named references to resolve, the current descriptor and the current descriptor URI are returned as the result.

For example, given the URI rd://example.com/!example!name/n1/n2?ex=ample#ex:

Query parameters from ex=ample are decoded.
Name example.com is dereferenced. Let's say it dereferences to 234567ABCDEFGHIJKLMNOPQR.
Current descriptor URI is set to rd://234567ABCDEFGHIJKLMNOPQR/!example!name.
Current descriptor is obtained by direct reference: RDCN is 234567ABCDEFGHIJKLMNOPQR, direct reference is !example!name.
Named reference n1 is resolved.
Named reference n2 is resolved.

Domain name resolution

In order to resolve the authotity to an RDCN, first test for an exact match with the RDCN rule.

If authority is a match, decode the RDCN from its textual representation. Otherwise, the authority is a registered name.

In order to resolve a domain name to an RDCN in the Internet, issue a query with QNAME set to the name and QTYPE set to TXT records.

Then, look for a TXT record in the response that matches the following rd-DNS rule and extract the value from the first record that matches.

rd-DNS   = %s"rd-graph " RDCN

Then, decode the textual representation from the record.

Direct-reference resolution

In order to obtain by direct reference a resource descriptor, given an RDCN together with a direct reference, connect to target databases and retrieve descriptor data, supplying the RDCN and the direct reference as input.

Please note that this is the sum of the data retrieved from the set of target databases, not from just one.

Named-reference resolution

In order to resolve a named reference seg, look in the current descriptor for RDF statements:

subjects of which are the current descriptor URI,
predicates are <rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#hasRef>.

Objects are are resources of type <rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#NamedRef> by definition.

Then, for each found object resource ref, look for RDF statements:

subjects of which are ref,
predicates are <rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#name>,
objects are literal values that are an exact match to seg.

If there is more than one matching subject ref, which one is chosen is undefined – such graph is malformed.

Then, look for an RDF statement:

subject of which is ref,
predicate is <rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#resolvesTo>.

The object is an RDF resource that is a resource descriptor.

If the URI of the object resource is of the rd scheme, then it MUST be in the normalized form.

The current descriptor URI becomes the object URI, which is resolved to a resource descriptor, which becomes the current descriptor.

The `db` parameter

The value of the db parameter is a space-separated list of database URIs and registered names.

By default, the list of target databases consists of only rd-graph.home.arpa. If the authority component was a registered name, the name is also included in the list.

Examples:

Input URI: rd://example.com/desc: Database #1: example.com; Database #2: rd-graph.home.arpa
Input URI: rd://111111111111111111111111/desc: Database #1: rd-graph.home.arpa
Input URI: rd://example.com/desc?db=example.org+http://example.com/rdg: Database #1: example.org; Database #2: http://example.com/rdg

The `format` parameter

The format parameter identifies the format of the representation. The value is either a media type or a URI of the data format.

By default, the value is multipart/prs.inumi.rdg-mime. [RDG-MIME]

The `signer-include` and `signer-exclude` parameters

The signer-include and signer-exclude parameters are both a space-separated list of POSIX Extended Regular Expressions or URIs.

Regular expressions begin with the ^ character, otherwise the list element is a signer identifier.

The output MUST contain only those descriptor pieces which are signed by at least one signer whose URI matches: at least one element in the signer-include list and does not match any of the elements in the signer-exclude list.

By default, signer-include matches all and signer-exclude matches none.

Direct-reference namespaces

A collection declares its direct reference namespaces within its Collection Descriptor.

In order to find out what namespaces are available, look in the Collection Descriptor for RDF statements:

subject of which is the normalized Collection Descriptor URI,
predicate of which is rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#hasNamespace,
object of which is a resource of type rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#Namespace.

A resource of type rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#Namespace has the following two required properties:

rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#name: an rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#PathSegment: the name of the namespace.
rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#nsrange: an http://www.w3.org/2001/XMLSchema#pattern: valid syntax of names within the namespace.

The subject SHOULD also have some properties referring to human-readable data with further information on how to interpret the names.

The `std.id32` namespace

The std.id32 namespace contain 32-bit unique descriptor identifiers. Its URI is rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#std.id32.

All names is this namespace MUST be an exact match to the ID32 ANBF rule defined as follows:

ID32 = 8( DIGIT / %x61-66 ) ; hexadecimal with small letters

The `std.id64` namespace

The std.id64 namespace contain 64-bit unique descriptor identifiers. Its URI is rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#std.id64.

All names is this namespace MUST be an exact match to the ABNF ID64 rule defined as follows:

ID64 = 16( DIGIT / %x61-66 ) ; hexadecimal with small letters

The `std.hash` namespace

The std.hash namespace contain hash values. Its URI is rd://HBW6WLCGLXH23GOS3M2DIGUD/!std.id32!03152074#std.hash.

All names is this namespace MUST be an exact match to the alg-val rule defined in [RFC6920].

Database interface

Interface of a database has two functions: retrieval and submission.

A database is identified canonically with an URI. The URI can be shortened to just a registered name.

Documents that define implementations of this interface define the syntax of their URIs and how to map a domain name to a URI.

Retrieval

Input parameters:

Resource-descriptor Collection Number
direct reference

Output is a representation of a resource descriptor.

If the direct reference is empty, the request pertains to the Collection Descriptor.

This is a simple data retrieval operation.

Submission

Input parameters:

Resource-descriptor Collection Number
direct reference
descriptor piece

If the direct reference is empty, the request pertains to the Collection Descriptor.

Output is a list of response tuples, each one made of:

a database URI from which the tuple originates,
piece identifier,
signer identifier,
response code.

The response code is one of:

A: signature accepted,
R: signature rejected,
P: piece rejected,
C: piece collision.
N: piece not found.

Database behaviour

Databases MUST process the submitted descriptor piece as follows.

A piece with an empty graph is interpreted as a piece reference, in case the protocol used does not allow for explicit references.

Let S be the submitted piece.
Let result be a list of response tuples.
For each signature sig in S:
1. Let rt be a response tuple.
2. Set database URI of rt to the URI of this database.
3. Set piece identifier of rt to the piece identifier of S.
4. Set signer identifier of rt to the signer identfier of sig.
5. Set response code of rt to N.
6. Append rt to result.
Let P be a reference to a descriptor piece.
If S is a piece reference:
1. Find a descriptor by direct reference. If not found, return result.
2. Find a piece by identifier of S. If not found, return result.
3. Set P to the found piece.
Otherwise (if S is not a piece reference):
1. Find a descriptor by direct reference. If not found, create it.
2. Find a piece by identifier of S. If found:
  1. Set P to the found piece.
  2. If the set of statements in the graph of S is not exactly the same as in the graph of P, update all response codes in result to C and return result.
  Otherwise (if piece was not found):
  1. If the graph of S does not meet the requirements of this document and of the database, update all response codes in result to P and return result.
  2. Insert S into the descriptor.
  3. Set P to S.
For each signature sig in S:
1. Verify sig over the graph of P.
2. If signature is correct:
  1. Update the response code of the corresponding tuple in result to A.
  2. Look in P for a signature with the same signer identifier as sig.
  3. If found, replace it with sig. Otherwise, insert sig to the signature map of P.
  Otherwise (if signature verification failed), update the response code of the corresponding tuple in result to R.
Return result.

Note that a new signature may have an expiration time equal to the current time or be in the past, which effectively revokes it.

Pieces that have expired SHOULD be removed and those that did not SHOULD be kept until they do. It is up to the specific database when and which pieces are kept. Users have no guarantee that databases will keep storing pieces. It is most desirable that users have their own database, instead of relying on others for storing and serving their pieces.

Security considerations

RDCN length

Because the length of an RDN is a variable that increases with the passage of time, it can be used to determine a period of time within which a given RDN was allocated.

There will be numbers "before" and "after" the increase.

Since some of the numbers are intended to identify user accounts, there is a risk of discrimination based on this property - i.e. based on the age of a number => the account => the user's age.

RDCN flood

Databases SHOULD be somehow protected from flood attacks, although this can be said about any kind of public network service.

Database MAY also forget about collections for which there is no information available within all the networks that they operate in. Such collections could unnecessarily pollute the RDCN space.

Descriptor formats

If a database stores RDF graph data in the submitted format, it ought to remove all comments in the serialization, in order to avoid malicious users from using the database as a data storage by including comments with arbitrary data.

The statements in the graph SHOULD also be processed and validated. This document recommends putting pieces from unrecognized users into a quarantine to be later reviewed by a human being. Data from trusted users could skip the quarantine in general, but it is recommended to also quarantine it once in a while.