Time of page creation
Time of last modification
Kueea (author)
<inumi@fumu-no-kagomeko.kueea.cyou>

Resource-descriptor Graphs:
Framework

Draft; do not implmement.

This document is part of a series of documents that collectively describe a system called Resource-descriptor Graphs.

It defines the system framework and its basic objects and functions. It is the starting point for those whishing to learn about the system.


Introduction

When building a database system designed for very long-term use, there arises a need for permanent identifiers for its objects. Usage of provisional identifiers will eventually lead to a situation where a name ceases to refer to the object it has named, effectively making all references to the name incorrect until said references are updated. This implies the need for constant database monitoring and maintenance. Users of the system would also need to be informed of any changes made.

This specification defines abstract data objects called ‘resource descriptors’ and a URI scheme for naming them. It also defines an interface for interacting with these objects. Anything else is outside the scope of this document, including mapping said interface to a communication protocol.

Resource descriptors contain knowledge about a topic designated by the URI. They are abstract in the sense that the content of their representation is different depending on the current time and the contacted host. It is neither a specific object nor a network location. You may think of the URI as a precise search term supplied in a query and of the resource descriptor as an answer to that query, the query being: What do you know about X?

Key words

The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Descriptor identifier

Resource descriptors are identified by the rd URI scheme defined herein.

Syntax

The syntax of these URIs is defined by the following rd-URI ABNF rule. It follows the generic URI syntax defined in [RFC3986]. The reg-name, segment-nz, query and fragment rules are imported from that document.

rd-URI   = "rd://" rd-auth [ rd-path [ "?" query ] [ "#" fragment ] ]
rd-auth  = reg-name / RDGN
RDGN     = 1*RDGN-blk
RDGN-blk = <24>b32-char
b32-char = ALPHA / "2" / "3" / "4" / "5" / "6" / "7"
rd-path  = 1*( "/" segment-nz )

Scheme

The rd in the scheme stands for ‘resource descriptor.’ It is expected that these identifiers are stored in large amounts. A two-letter abbreviation was chosen in order to save space and to make the computation time of URI comparisions shorter.

Authority

The authority component contains either the canonical Resource-descriptor Graph Number (RDGN) or a registered name for ease of human input.

Resource-descriptor Graph Number (RDGN)

The canonical authority is a Resource-descriptor Graph Number (RDGN). It is a randomly-generated unsigned integer, which identifies a graph (collection) of closely-related resource descriptors.

The maximum value of the RDGN is a variable called its length. The unit of an RDGN length is a 120-bit block.

The initial length is 1 block. The length is increased in steps of blocks, i.e. by 120 bits.

At least one bit of the last block MUST be set. RDGN with all bits cleared is invalid.

These blocks ensure that both base64 and base32 encodings of the binary representation produce strings without any padding. This also leaves one free octet in a 16-octet buffer for use by software, where a last-block marker or the amount of remaining blocks could be stored.

Textual representation of an RDGN is constructed by representing the number as a sequence of octets in ascending order of octet significance. The resulting sequence of octets (its length is a multiple of 15) is then encoded into text using some octet-to-text encoding. Within the URI, RDGNs are encoded with the base32 encoding. [RFC4648]

Note: One block (15 octets) produces 24 base32 characters.

Note: base64 cannot be used because URI authorities are case-insensitive.

Anonymous Graph

RDGN 1745936836749459630212825467061601310 (ANONANONANONANONANONANON in base32) is reserved for the Anonymous Graph.

The Anonymous Graph SHOULD be used in examples.

Implementations MAY define special processing for the Anonymous Graph.

Empty blocks

Trailing empty blocks (those with all bits cleared) SHOULD be removed.

For example, the RDGN ANONANONANONANONANONANONAAAAAAAAAAAAAAAAAAAAAAAA is equal to ANONANONANONANONANONANON.

RDGN Collector

This technology was created for use in a Kueea Network. [KUEEA] It is a peer-to-peer network, in which a node may advertize that it wishes to take on a given network role.

Taking on the role of an RDGN Collector means that the node will be contacted by other nodes in order to determine allocation state of an RDGN. The role of a Collector is thus to collect allocated (known) numbers.

This comes without saying, but RDGN Collectors SHOULD remember their collected numbers for as long as possible. The minimum amount of information on a given RDGN a Collector needs to store is a boolean value indicating whether the number has been allocated or not.

Collectors also keep track of the current RDGN length. The length is independently increased by each Collector node when 1% of all numbers of the current length is known to be allocated. In other words, the second block is added after 2120/100 numbers have been collected, the third after 2240/100 numbers, etc.

There SHOULD be multiple nodes functioning as an RDGN Collector within a given network because nodes MAY resign from their role at any time. Nodes may also unexpectedly disappear from the network.

Definition of a Collector access protocol is out of scope of this document, although there is one rule that MUST be met: Collectors MUST NOT generate new RDGNs; the numbers MUST always be provided by the client.

The client provides a randomly-generated number to a Collector, in order to ensure the number has really been randomly generated. If a Collector (remote node) would control the generation of numbers, it could present numbers which only appear to be random.

If a number has not been allocated yet, it is marked as allocated, in a first-come, first-served fashion.

Number generation

Length of new RDGNs MUST be equal to or greater than the current length.

In order to generate a new RDGN, a node MUST first contact a Collector in order to determine the current RDGN length. It then generates necessary amount of blocks filled with random bits.

If the system has access to a real-time clock, it MAY use the current date and time in the generation of the number. The reason for this is that value of the current time only increases, which ensures uniqueness (the more precise a clock the better). This applies to some unspecified bits of the RDGN only, not all of them. One still needs a random number generator for the remaining bits.

Registered name

The authority component is treated as a registered name if, and only if, the authority component is not a multiple of 24 characters or it contains a character not matched by the b32-char rule.

These names are only defined for ease of human input.

This document defines only one domain of registered names: DNS [RFC1035] domain names.

This document may be updated in the future in order to define additional domains of registered names, although it is believed that such a need will never arise. The authors believe that no other system for storing human-readable, registered names than DNS is required, because any such system would ultimately have exactly the same problems as those identified with DNS. Any human-readable public naming system requires a global registry, which must be centrally managed in order to solve disputes over names. Most problems with the public DNS are not with the database itself, but are rather problems with the management (such as lack of trust in it) or with the protocol used for transferring the data over a network link.

In order to resolve a domain name to an RDGN, issue a query with QNAME set to the domain name and QTYPE set to TXT records. Then, look for a TXT record in the answer section that matches the following rd-DNS rule and extract the value of the RDGN from the first record that matched.

rd-DNS   = %s"RDGN " RDGN

Path

The path is a human-readable, case-sensitive name of a resource descriptor, in the namespace identified by the RDGN in the authority.

Although the syntax defines paths as hierarchical, resource descriptors are considered to be nodes of a graph (which may not be a tree). There is no concept of parents and children within the namespace. In other words, the existence of /a/b/c does not imply the existence of /a nor the existence of /a/b.

For clarity: Paths MUST NOT end with a U+002F SOLIDUS character (/) and empty path segments are not permitted, i.e. there MUST NOT be any two consecutive U+002F SOLIDUS characters (//) within the path segment.

Each resource descriptor should contain knowledge that is narrow in scope. Descriptors are to form a graph structure that applications traverse. The names ought to be chosen so that the amount of knowledge within one descriptor is concise and can be processed relatively fast. Large amounts of information should be split among multiple descriptors so that applications do not waste time processing unnecessary data.

For example, if a Compact Disc is to be described, only information about the disc itself should be present and nothing else. Even if there are songs on the disc, a song is not a disc – it should have its own descriptor.

The /ROOT descriptor

Every graph MUST contain at least the descriptor under the path /ROOT, which is referred to as the graph's Root Descriptor. It is the the first descriptor an application retrieves.

It contains information about the classes of the graph and references to other nodes (descriptors) that exist within the graph.

Only nearby nodes SHOULD be referenced by a node, i.e. the minimal set required to reach all other nodes.

There MAY be ‘hidden’ nodes in a graph; that is, nodes, to which no reference exists in nodes accessed by following references starting at the Root Descriptor.

A specific use-case are graphs which contain information about resources which are identified by some external identifier, which is used as the name of the corresponding descriptor. Since the user has prior knowledge of the external identifier, it can access the descriptor directly (by mapping the external ID).

Query

The query component is a serialized list of name-value pairs.

The serialization algorithm is as follows: For each pair:

  1. Encode both the name and value using UTF-8. [RFC3629]
  2. If output is not empty, append an U+0026 AMPERSAND character.
  3. Append the name.
  4. Append an U+003D EQUALS SIGN character.
  5. Append the value.

The space of query parameters is defined separately for the retrieval function and for the submission function.

Normalization

Do the following in order to normalize an rd URI.

  1. If the authority is a registered name, dereference the name and modify the authority component accordingly.
  2. Remove the query component.

When visually presenting an rd URI to a user, RDGN authorities SHOULD be printed using capital letters and registered names using small letters (for clarity).

URI comparision

When comparing URIs according to this framework:

Descriptor definition

A resource descriptor is a unique mapping of a piece indentifier to a descriptor piece.

A piece identifier is a character string with the same syntax as the value of Message-ID field of Internet Messages. [RFC2822]

A descriptor piece is a pair of an RDF graph and a unique mapping of a signer identifier to a signature.

An RDF graph is a non-empty set of RDF statements. [RDF]

A signer identifier is a (usually acct) URI. [RFC2822]

A signature is a tuple of:

Signatures are described in more detail in another section.

Graph requirements

Descriptors are meant to store descriptions of other resources. Graphs that contains statements that, for example, state that file F has content C, where C lists all bytes of the file must be rejected. While such description is theoretically correct, this is not the kind of data this framework is designed for.

Signatures

Signatures are an assessment by a user that all of the statements contained within the RDF graph of a piece are all correct and true. The user also states its confidence in the assessment.

Creation time

A signature's creation time is a specific date and time.

Creation time is represented as <sig-create>:

sig-create = date-time

The <date-time> rule is imported from [RFC3339].

Data necessary for verifying a signature is obtained by dereferencing the signer identifier. The method of resolution is out of scope of this document.

Other documents specify how exactly to dereference them. It is expected that the mechanisms will change with time, although they will be rather tied to the URI scheme. For example, an HTTP URI could be dereferened as normal when the creation time is recent and redirected to a Web archive when old instead, because the resource probably is not there anymore.

In any case, the resolution of a user identifier produces a list of data structures containing signature verification data. Each such structure contains validation period of the data.

The correct structures to use when verifying signature data are the ones whose validation period covers the creation time.

Expiration time

A signature's expiration time is either a specific date and time or a keyword.

Expiration time is represented as <sig-expire>:

sig-expire = date-time / sig-expkey
sig-expkey = %s"never"

Once the specified time comes, the signature becomes expired. Once all of its signatures expire, a piece is said to expire, too. Such pieces are usually deleted from a database.

The only defined keyword by this document is never, indicating that the signature never expires. This value should be used only when your confidence is very high.

Confidence level

The confidence level expresses the signer's confidence in its assessment of the truthfullness of information in the graph. It is a numerical value in the range between 0.001 and 1.000, inclusive, in steps of 0.001.

Confidence is represented as <sig-clevel>:

sig-clevel = %s"full" / 2DIGIT "." DIGIT

The value 1.000 is represented by the keyword full. Other values are expressed as percentage without the percentage symbol.

User software assigns credibility values to RDF statements. The value is computed from the user's trust in the signer (a user-configurable multiplier value) and the signer's confidence. It is a (multiplication) product of the two values.

The point of this is to ecourage users to verify information before signing and to set as their goal maintaining a high level of trust from others, because a change in trust impacts all signatures made in the past.

The system deliberately stores no data regarding the user who submitted the information in order to put emphasis on what is being stated and the users' ability to verify the information presented to them. Users that only produce or spread information are not trustworthy.

Signature scheme

The signature scheme is a short name, which identifies a particular method of verifying signature data. It is mandatory field of signature verification data structures. The scheme defines the fields of the structure and what to do which them.

This document does not define any such signature schemes,

Signature scheme is represented as <sig-scheme>:

sig-scheme = 1*(ALPHA / DIGIT / "-")

Signature data

A signature's signature data is generated as follows:

  1. Let input be an empty sequence of octets.
  2. Let graph be the piece's RDF graph.
  3. Append the signer identifier <URI> to input.
  4. Append the creation time as <sig-create> to input.
  5. Append the expiration time as <sig-expire> to input.
  6. Append the confidence level as <sig-clevel> to input.
  7. Append the scheme identifier as <sig-scheme> to input.
  8. Encode graph into its application/prs.inumi.rdg-graph representation [RDG-GRAPH] and append the resulting sequence of octets to input.
  9. Apply the signature scheme on input and return the result.

Encode all components except signature data with US-ASCII [RFC20].

This algorithm ensures that signatures do not depend on the representation of a descriptor, of RDF graphs in particular.

Endpoint signature

A signature scheme will surely become obsolete at some point in time.

When transferring, signatures should be signed by the sender. Receiving endpoints should store them together with the sender signature if the sender is another endpoint or sign them themselves if it is a user. Endpoint signatures MUST be updated before they expire. Signatures without a valid endpoint signature are considered expired.

Note that trusted endpoints only need to update their own signatures. If an endpoint receives signatures from another, trusted endpoint, the endpoint then signs the received data itself (because of its trust). The (user) signatures are now signed by both endpoints.

This is to protect against a counterfeit signature coming from an impostor, which is labelled as being created in the past, using obsolete data, when in fact said impostor has crafted it recently. Endpoint signatures verify that signatures were received at the right time.

Endpoint signatures are computed over the following octet sequence, given a list of signatures in ascending order by creation time:

  1. Let data be the octet sequence.
  2. For each signature to be signed:
    1. Append the signer identifier <URI> to data.
    2. Append the creation time as <sig-create> to data.
    3. Append the expiration time as <sig-expire> to data.
    4. Append the confidence level as <sig-clevel> to data.
    5. Append the scheme identifier as <sig-scheme> to data.
    6. Append the signature data to data.
  3. Return data.

Encode all components except signature data with US-ASCII [RFC20]. If creation times are equal, compare signer identifiers.

When a user submits a descriptor piece to an endpoint, the user should treat oneself as the endpoint (self-sign), using the same signer identifier for both signatures.

Resolution / Interface

Interface of a resolver has two functions: retrieval and submission.

This document does not define any resolution mechanism for identifiers without a path component (for resolving graphs).

Both of these functions take a desciptor indentifier as input. Parameters are extracted from the query component of the identifier and then the URI is normalized, removing the query component.

Graph data is stored at Resource-descriptor Graph Endpoints. Endpoints are referenced by a URI or a domain name.

Documents that define protocols for these endpoints define the syntax of their URIs and how to map a domain name to a URI. The documents MAY also override default values of parameters.

The rd-graph.home.arpa domain name

This document defines the DNS domain name rd-graph.home.arpa. It is a name within a residental home network. [RFC8375]

Availablility of this name is REQUIRED to comply with this document. It is configured locally per site as an opt-in to the RDG system and MUST locate at least one Resource-descriptor Graph Endpoint, according to the specifications of said endpoint.

Common parameters

This section lists parameters for both retrieval and submission.

The endpoints parameter

The value of the endpoints parameter is a space-separated list of Resource-descriptor-Graph-Endpoint URIs and domain names.

By default, the list consists of the rd-graph.home.arpa domain name. If the authority in the input rd URI was a domain name, the name is also included in the list.

Examples:

Input URI: rd://example.com/desc
Endpoint: example.com
Endpoint: rd-graph.home.arpa
Input URI: rd://ANONANONANONANONANONANON/desc
Endpoint: rd-graph.home.arpa
Input URI: rd://example.com/desc?endpoints=example.org+http://example.com/rdg
Endpoint: example.org
Endpoint: http://example.com/

Retrieval

Input is a descriptor identifier.

Output is a data object whose format depends on the format parameter.

The format parameter

The format parameter identifies the format of the representation. The value is either a media type or a URI of the data format.

By default, the value is multipart/prs.inumi.rdg-mime. [RDG-MIME]

The sig-include and sig-exclude parameters

The sig-include and sig-exclude parameters are both a space-separated list of POSIX Extended Regular Expressions or URIs.

Regular expressions begin with the ^ character, otherwise the character string is a specific URI.

The output MUST contain only those descriptor pieces which are signed by at least one signer whose URI matches: at least one element in the sig-include list and does not match any of the elements in the sig-exclude list.

By default, sig-include contains all signers and sig-exclude is empty.

The schemes parameter

The schemes parameter is a space-separated list of signature schemes.

The output MUST contain only those descriptor pieces which have at least one signature generated using a scheme listed in the schemes list.

Submission

Input is a descriptor identifier and a descriptor piece.

Output is a list of pairs of an endpoint URI and a character string.

The first character of the string indicates the result of a submission. It is one of: success (S), partial (P) or failure (F). Subsequent characters SHOULD contain a human-readable messsage to the user.

Success means that all submitted data was accepted.

Partial means that only a portion of the submitted data was accepted.

Failure means that none of the submitted data was accepted.

The resolver contacts endpoints one after another and submits the received descriptor piece via a protocol the endpoint supports. For each endpoint, a pair indicating the result is appended to the list.

Two kinds of submission

Submissions are either from a user or from another endpoint.

When a user submits to an endpoint, the user treats oneself as the endpoint and self-signs one's signature, using the same signer identifier in both.

Otherwise, the submission is from another endpoint.

Endpoint behaviour

Endpoints MUST process the submitted descriptor piece as follows.

A piece with an empty graph is interpreted as a piece reference, in case the protocol used does not allow for explicit references.

  1. Let S be the submitted piece.
  2. Let P be a reference to a descriptor piece.
  3. If the graph of S is a reference:
    1. Find a descriptor by its identifier; if not found, return failure.
    2. Find a piece by the reference; if not found, return failure.
    3. Set P to the found piece.
    Otherwise (if the graph is not a reference):
    1. Find a descriptor by its identifier; if not found, create it.
    2. Find a piece with an identifier equal to that of S. If found:
      1. Set P to the found piece.
      2. Return failure if the set of statements in the graph of S is not exactly the same as in the graph of P.
      Otherwise (if piece was not found):
      1. If the graph of S does not meet requirements of this document or those of the endpoint, return failure.
      2. Insert S into the descriptor.
      3. Set P to S.
  4. Verify the signatures in S over the graph of P.
  5. Discard all signatures that failed to verify from S.
  6. Return failure if no signatures remain in S.
  7. Update the signatures in P with the ones in S.
  8. Return success (all signatures valid) or partial.

Note that a new signature may have an expiration time equal to the current time or be in the past, which effectively revokes it.

Pieces that have expired SHOULD be removed and those that did not SHOULD be kept until they do. It is up to the specific endpoint when and which pieces are kept. Users have no guarantee that endpoints will keep storing their data. It is most desirable that users have their own endpoints, instead of relying on others for storing and serving the data.

Security considerations

RDGN length

Because the length of an RDGN is a variable that increases with the passage of time, it can be used to determine a period of time within which a given RDGN was allocated.

There will be numbers "before" and "after" the increase.

Since some of the numbers are intended to identify user accounts, there is a risk of discrimination based on this property - i.e. based on the age of a number => the account => the user's age.

RDGN Collectors

RDGN Collectors SHOULD be somehow protected from flood attacks, although this can be said about any kind of public network service.

Collectors MAY also forget about numbers for which there is no information available within all the networks that they operate in. Such numbers could unnecessarily pollute the RDGN space.

Endpoints

If an endpoint stores RDF graph data in the submitted format, it ought to remove all comments in the serialization, in order to avoid malicious users from using the endpoint as a data storage by including comments with arbitrary data.

The statements in the graph SHOULD also be processed and validated. This document recommends putting pieces from unrecognized users into a quarantine to be later reviewed by a human being. Data from trusted users could skip the quarantine in general, but it is recommended to also quarantine it once in a while.

IANA considerations

URI scheme registration to be written.