FUNDAMENTALS OF CONTROLLED DOCUMENT MANAGEMENT

WHAT IS A CONTROLLED DOCUMENT

Controlled documents have characteristics that differentiate them from other types of documents. The management of these types of documents is governed by strict rules, often dictated by regulatory or compliance obligations. Examples of controlled documents include standards, procedures and policies.

The following are typical key characteristics of controlled documents.

Review and approval. All controlled documents are subjected to review and approval. A document must be formally approved before it can be released and used.
Formal issuance and withdrawal. Controlled documents are formally issued for use, and withdrawn from use when no longer valid.
Currency status and distribution control. All controlled documents must have a currency status. That is, a document may have a status of current, superseded, obsolete or canceled for example. Controlled documents that are no longer current must be withdrawn from use so that they are not inadvertently used.
Issue history. Different issues or releases of a controlled document must be uniquely identified (e.g. with a revision or version number).
Directive authority. Controlled documents, such as standard operating procedures for example, typically carry directive authority. That is, the document gives instructions that must be followed.
Periodic review. Directive authority type controlled documents, such as standards, procedures and policies, must be reviewed on a periodic schedule and revised and re-issued as appropriate. Engineering drawings are a class of controlled document but are generally not subjected to review on a time-based schedule but are revised based on other triggers.
Unique identification. Controlled documents must be uniquely identifiable. Generally this is via a document ID, being a string of characters, a sequence number or a combination thereof.

Effectively managing controlled documents requires a system specifically designed to handle these unique characteristics.

ESSENTIAL REQUIREMENTS OF A CONTROLLED DOCUMENT MANAGEMENT SYSTEM

Structure and Relationships

The characteristics of controlled documents dictate that they must be managed in a system which has structure. Furthermore, the management of relationships between different entities must be embodied within the structure. For example, the different releases of a particular document are all related to the same document ID. At the same time, the document ID must be unique.

To effectively manage these relationships a robust data management platform is essential. Relational databases are the most appropriate platform for this purpose. A relational database natively enforces referential integrity and manages relationships. To effectively manage controlled documents in a relational database the database structure must be designed to mirror the real-world structure of how controlled documents must be managed.

Document Identification

A fundamental rule of controlled document management systems (CDMS) is that each document must be uniquely identified. The CDMS must enforce the uniqueness of each document ID. The most effective way to implement this requirement is via a single register of document IDs. Hence the first requirement of a CDMS is that it must contain a single register of document IDs, with an associated constraint that all entries must be unique.

The document ID which must be unique is that which is displayed on the published document. Document IDs are cross-referenced in the text of documents, and they must be able to be readily verbally communicated and transcribed, i.e. they must be human-friendly. For this reason GUIDs (globally unique identifiers) are unsuitable for use as document IDs, even though they can be guaranteed to be unique without needing to be managed in a single register. The identifiers of national and international standards (e.g. ISO 9001, IEC 60167, IEEE 1584, ANSI Z535.1) are good examples of document IDs.

Conceptually a document ID is an abstract entity separate from the files which represent the actual document object. With a dedicated document ID register the function of reserving document IDs, for example, is straightforward. There is no need to supply or create dummy files in order to reserve document IDs.

For various reasons (such as inflexibility, explained in more detail elsewhere on the web) embedding metadata into document IDs is poor practice. The advent of relational databases many decades ago obviated the need to use a structured document ID as the primary means of searching, sorting and grouping documents. The only structure which may be applicable is one which indicates the namespace or originating register (e.g. ISO, IEC, IEEE, ANSI), ensuring its unique context.

Despite the fact that metadata should not be embedded within a document ID, apart from enforcing uniqueness a CDMS should not impose constraints on the format of the document ID. Most organizations will use some type of document ID format. The organization should be able to decide on the document ID format independently of the CDMS. The CDMS should also maintain only a single document ID register and not maintain a parallel or proxy document ID register. To guarantee that no duplicate document identifiers can be created, the CDMS must not maintain different registers for different document types and hence allow the configuration of duplicate identifiers across different registers.

Relationship Between Documents and Files

While enforcing the uniqueness of the document ID, the CDMS must also cater for the relationship structure that a single document ID is related to a series of releases. Within each release there will typically be two renditions. For example, there will be a source file (e.g. .docx or .dwg) and a published file (.pdf). Theoretically, the published rendition of a single release may comprise multiple files, such as a main document and a separate appendix file in a different format. The document ID is not a metadata attribute of the file. The file is related to the document ID. The document ID is the parent and the file is the child. This logical separation of the document ID from the physical files is critical, allowing for flexible management of various renditions and formats over time. The CDMS must be flexible enough to handle these requirements and mirror the real-world nature of the data. Relational databases are specifically designed to efficiently manage these types of relationships.

User Interface and Control

Document search results must be presented to end users in such a manner that users are not likely to inadvertently use a document which is not the latest release of a current document. Documents which have been canceled, or releases which have been superseded, should not be presented without additional deliberate actions by the user or perhaps specific authorization. The most efficient way to handle these requirements is via metadata.

Given that users must be able to download document files for various reasons, a strategy must be applied to reduce the risk of non-current documents being used. The simplest and most efficient strategy for ensuring that users are only referring to the current release of a document is to explicitly communicate on each document that the master copy resides in a given nominated single repository. Documents viewed through channels outside of that repository are defined as uncontrolled by default. Thus, the designation of what constitutes a 'controlled copy' is not by what is displayed on the document, or whether or not it is printed, but by the user interface channel through which it is being viewed.

Metadata

Metadata is used in a CDMS for both controlling documents and also for searching, sorting and grouping documents and communicating information.

Some metadata is related to the document ID and some is related to the file. The metadata which is related to the document ID should only be that which does not change from one release to the next. For example, the title and subject matter would generally be related to the document ID, remaining constant across releases, while the specific release number, author and release date would be related to the file issued for a particular release. These relationships reflect the logical structure of the real-world data. The two are combined in search results data and the fact that they are stored separately should be invisible to the end user.

Release identifiers are variously referred to as version or revision number. (Engineering drawing releases are generally referred to as revisions, and pre-release copies as versions, but the terminology is often reversed in other contexts.) Organizations will have different policies as to how releases are identified. While typically referred to as 'numbers', release identifiers can sometimes be letters, or even dates. The CDMS should not constrain how release identifiers are formatted. The set of release numbers for a given document ID must also remain as a single unified set, for example across changes in file type of the underlying source files.

Basic principles of metadata management dictate that document control metadata must be related to files, not embedded within the file. For example, the status of a file as current latest release or superseded should be by linking a reference to the file to an element in a lookup table. The status is changed by changing the link table without accessing or modifying the file, and the status is read without needing to read the file.

In keeping with basic data structure principles, the metadata should be in a normalized structure. This ensures data integrity and efficiency by storing each element of data only once, referencing it via keys across related tables.

Storing Files

The principle of ensuring that data remains consistent with the associated files is that of transactional integrity. Specifically, the transactions must be ACID-compliant. Otherwise the data can readily become corrupted. A relational database engine can guarantee transactional integrity, but only if the binary content of the file is under the full control of the database.

The approaches to storing files and associated metadata have evolved over time. The most recent development is cloud object storage. Cloud object storage offers extreme levels of scalability and other benefits, but has the drawback of compromising transactional integrity compared with frameworks where the file binary content is under the full control of the database. For some solution needs transactional integrity is less important. For a CDMS solution however the integrity of the data is critical. The prospect of users being presented with an obsolete document, or a sensitive and restricted document, due to data corruption caused by a failed transaction for example, would not be acceptable. Transactional integrity with cloud object storage can be managed by the application layer, but it inherently can never be as robust as where it is entirely managed within the relational database engine.

Ultimately a CDMS user must be able to extract a document file from the system. As with document IDs, the CDMS should not constrain the user or organization with regards to how extracted files are named. The filename should generally simply match that given when the file was uploaded.

A SHORT HISTORY OF FILE STORAGE AND RELATIONAL DATABASES

The binary content of files can be stored directly within a binary column within a database table (referred to as a Binary Large Object – BLOB). This approach tends to create scalability and maintainability issues however. Conversely, when file binary data is stored external to the database the database loses its inherent ability to guarantee transactional integrity. If the data is stored external to the database then the application layer must assume responsibility for managing transactional integrity. In 2007/8 Oracle and Microsoft introduced solutions to this problem with SecureFiles and FILESTREAM respectively.

However SecureFiles and FILESTREAM still present limitations in the context of extremely large and distributed architectures. Hence for very large scale and distributed requirements (e.g. petabyte scale volumes), cloud object storage (such as Azure Blob Storage, Amazon S3 and Google Cloud Storage) have been developed as a more cost-effective solution. These solutions allow multi-tenanted, centrally managed storage for any scale, small or large.

The drawback of cloud object storage is that transactional integrity with an associated relational database must be mediated by an application layer. For solutions where transactional integrity is important, cloud object storage is inherently less robust in its integrity guarantees than options where the file binary data is under the full control of the relational database engine. Cloud object storage also introduces a security framework which is separate from and in addition to that of the relational database. All these challenges are manageable, but the solutions entail compromises. A security and integrity framework comprising an encapsulated single security model with transactional integrity fully assured, all within a single system without involvement of an application layer, is not possible with cloud object storage. However this framework is possible with a solution based on a relational database which retains full control of the file binary data content.

CONTROLLED DOCUMENT MANAGEMENT SYSTEM ARCHITECTURE

Scenarios

Below is a list of scenarios in day-to-day use of controlled document management systems. Some of these are very common.

"I know the document exists. Why can't I find it?"
"I have the number, why can't I find the document?"
"Why are there three document IDs and two different release numbers for the same thing?"
"Why did the revision history change after migration?"
"Why did the revision numbering sequence restart after migration?"
"I want to find a single document but a search using the ID gives me a list of 50, and none of them are the document I'm looking for."
"Why do we need tribal knowledge to find documents?"
"Why do we have to touch every document for this migration?"
"Why can't I reserve a document ID without creating a placeholder file?"
"Why can't I register a document ID with a '/' in the ID string to match a vendor document ID?"
"Why can't I just go to a single document ID search box to find any document of any type?"
"Why can't I perform a simple Document-ID-Equals search?"
"Why does the cross reference printed on the document not match the document ID in the system?"
"The system can't be used for scanned drawings. It can only handle documents containing text."
"How did we end up with several cases of two completely different historical drawings with the same drawing number printed on them able to exist for years in the document management system and no-one knew?"
"Why do we keep finding inconsistencies in IDs and cross referencing that the system has not prevented?"

Do these pathologies derive from implementation shortcomings, do they stem from software architectural limitations, or are they a combination of factors? Before answering that question we first need to take a closer look at how controlled documents work – separately from the question of the software or processes that manage them.

The Real-World Structure of Controlled Documents

Controlled documents are a special class of document that are governed by particular rules. The outworking of those rules is that controlled documents inherently form a hierarchical structure of data and objects. Below is a conceptual structure example of controlled documents. This is unrelated to the software that might be used.

    Document Number [2415-004-C-102]
    │
    ├── Document Metadata (Title, Subject, etc.)
    │
    ├── Rev 2
    │   ├── Release Metadata (Date, Author etc.)
    │   ├── PDF Rendition File
    │   └───── CAD Source File
    │
    ├── Rev 1
    │   ├── Release Metadata
    │   ├── PDF Rendition File
    │   └───── CAD Source File
    │
    ├── Rev 0
    │   ├── Release Metadata
    │   ├── PDF Rendition File
    │   └───── CAD Source
    │
    ├── Draft Rev B
    │   ├── Release Metadata
    │   ├── PDF Rendition
    │   └───── CAD Source
    │
    └── Draft Rev A
        ├── Release Metadata
        ├── PDF Rendition
        └───── CAD Source

In the above example a cross reference to 2415-004-C-102 points to '2415-004-C-102 Rev 2 PDF Rendition File + Rev 2 Release Metadata + Document Metadata' (unless it refers to a specific revision, in which case it would read '2415-004-C-102 Rev 1'). This is important. A cross reference to a drawing or a procedure does not refer to a static file. It refers to the latest release, in the published or authoritative version format, of that series. When a new release is published the 'pointer' logically moves and refers to a different object. The document is not a tangible object as such. It is conceptually an abstract container comprising a set of objects. Rules apply to those objects, in particular their lifecycle state. There might be a Revision 3 for example which is still in draft form and hence is not yet current.

The above structure is quite different from that of arbitrary files that are typically found in corporate network folders, or files related to enterprise content management tasks such as invoices, correspondence or customer records. Files for purposes other than that described above are required to obey different sets of rules.

Mapping From The Outside In

The internal representation within the software system must somehow map the structure and functionality of controlled documents described earlier. The most effective approach will be one where the key business rules and entity relationships are natively mapped by the underlying schema. The greater the difference between the database schema and the real-world relationships the greater will be the burden on programmatic layers to fill the gaps. Complexity proliferates. There is a point however where no amount of programming can fill the gap. This is why in some cases seemingly simple user expectations (e.g. "Give me a Document-ID-Equals search function that will return exactly one result") ultimately become intractable because the underlying schema does not support it.

Hence the key design strategy requirement is this:

The system must be a faithful reflection of reality.

By 'reality' we mean the business logic model – the rules and implied expectations that the system needs to abide by.

The system does not generate reality – its job is to map it. Having a perfect schema doesn't guarantee that the user interface will work. What it guarantees is that it can be made to work. If the schema does not match reality however, then potentially the user interface can never be made to work how it should.

Critical Success Factors

Controlled document management systems exist to support outcomes in the real world. For example: People have access to the right procedures to follow to keep them safe. Designers have access to company standards so they can produce consistent designs to the required quality. The Finance team has a clear set of governance rules to work from. Workers can readily find accurate asset drawings essential for performing a task. Fabricators have the right drawings to manufacture the correct part or equipment.

The purpose of audits and certifications is to support ensuring that the real-world outcomes are not compromised by shortcomings in the system that delivers critical documents to people. The purpose of the system is not to pass an audit. The purpose of the system is to support real world outcomes. The purpose of the audit is to assess the system's effectiveness.

The end focus of a document management system is the users who read and apply the documents – document consumers. Without document consumers the entire system is superfluous. If the needs of document consumers are not being met then the system fails in its primary purpose. The most common manifestation of this is that users bypass the system entirely and create their own satellite repositories of documents – not because the users are lazy or untrained but because the system is ineffective.

Hence a key success criterion for a controlled document management system is this:

The system must serve the people using the documents.

That being the case, the most basic requirement of people using documents is this:

I need document ABC-123. Find it.

If the user becomes frustrated with the response to that simple need, then the system is not serving the users. They lose trust. If 'Find document ABC-123' returns more than one record because the system is actually performing a search of 'Find any document which contains the text ABC-123 or has that string in any metadata field' then the system is creating confusion and increasing cognitive load.

Further, if the users expect that Revision 6 should be followed by Revision 7, then a software migration that results in Revision 6 being followed by Version 1 is not serving the users. If users expect that a document should only be known by one document ID, being that displayed on the document and cross-references in other documents, yet within the management system it is known by three different IDs, then the system is not serving the users.

It is important to note here that there is often a large disconnect between what many software vendors and some audit practitioners claim that document control-related regulatory standards require of document management software and what the standards actually mandate. The standards don't contradict how users expect systems to behave -- the software does. Effective software respects the boundaries between the business domain and the technology used to implement it.

Consumption-Centric Design

A document consumer-centric design gives precedence to the following question:

How does a user find the document?

Ensuring this requirement is met may appear trivial. With the right architectural model and design philosophy it is trivial. But the common pathologies listed earlier demonstrate that the primacy of this requirement is often lost sight of. The needs of the document consumer appear to be an afterthought and are subordinated to other considerations, usually workflow control or collaboration related, or perhaps ease of development interests.

Overall, the question of effective consumption-centric design is not just about achieving positive document consumer end user buy-in. It is about avoiding exactly the types of incidents that a certified and audited document management system is supposed to prevent.

Bad outcomes occur in the real world when:

System outputs are inconsistent
System outputs don't match real world reality
System data doesn't match what is actually printed on the document
System data doesn't logically work in the way users naturally expect
Systems don't behave like end users reasonably expect them to
End users get confused
End users get frustrated
End users cannot find the information they need
End users do not have confidence in the system and work to bypass it

But most seriously, a poor system can force users to either make assumptions or make errors. Assumptions and errors can be precursors to accidents.

Cognitive Load

One of the purposes of software systems is to reduce cognitive load. If the solution to a problem is that users need to be educated then the system is increasing cognitive load, not reducing it.

Here is an example: User finds a cross reference to a document. They perform a search using that ID and it returns a long list of documents. After some time spent searching through the listed documents they eventually ask a colleague and discover that the document they were looking for had been made obsolete. None of the documents returned were the document in question, they were other documents that merely still contained cross references to the obsolete document. The user wanted to search by document ID, but instead the system entered the string into a fuzzy full-text content search. The correct response to the query should have been either zero records or a record indicating that the document was obsolete.

Patterns

The scenarios described earlier might be viewed as isolated issues arising from search configuration, migration activities, user training, metadata quality, or individual implementation decisions. While these factors can contribute, such issues frequently occur together and persist across different systems, organizations, different products and technology generations. But there is a common theme.

All of the complaints and scenarios listed earlier can be summed up in one statement.

The behavior of the system does not align with reality.

That is the pattern.

Hence the summary of the problem is this:

Reality is consistent. Some systems are better than others at aligning with reality.

The users are not misguided. Their expectations are not obscure or unreasonable. They don't need to be educated. The problem is that the system simply does not match the reality of how controlled documents are supposed to work.

Root Causes

All of the pathologies described earlier can be traced back to one or more of three causes related to the software.

System-Centric as opposed to User-Centric design
Workflow and Collaboration-Centric as opposed to Document Consumer-Centric
Object-Abstracted as opposed to Register-Centric design.

An example of a system-centric approach is a notion that if the system is certified and passes an audit then all is good. User complaints are a distant secondary consideration to auditor complaints. The system might superficially appear to be fully compliant but hidden edge cases exist which breach core controlled document management principles. Or perhaps the system and all its content is in perfect order but the user experience is so terrible that end users go out of their way to bypass the system entirely.

Most document management systems are workflow and collaboration-centric, or file sharing-focused. These systems prioritize the process over the product. The system is designed for high velocity, highly rigorously controlled creation and editing of documents. The system is explicitly designed to prevent users from following a different workflow management system outside of the document management software, even if it could actually be more efficient. Sometimes the software is also designed to prevent users from having control over key metadata such as the release identifier, e.g. whether it is a number or letter or both, whether it is called a Revision or a Version, or where it starts, or whether it is allowed to change format mid-series from letter to number sequence. Sometimes a proxy field might be offered as a workaround, sometimes not. Typically, ease of editing collaboration is the key focus, and the ability of end-users to effectively find what they need, and that the behavior of the system will match user's expectations of how the business rules should work, is just assumed to be taken care of. But often it is not. Many document control systems are essentially collaboration and file sharing tools pretending to be document registers or controlled document management systems. They are file catalogs with workflow engines.

The third root cause of controlled document system pathologies is deeply architectural, but it can be identified very simply: there is no document register table. This root cause revolves around a question of identity.

What is the primary identity?
Who owns it?
How is it resolved?

Commonly the document ID is treated as merely a peer metadata field alongside other metadata. The primary identity is not the document ID that the user deals with. The primary identity is an object identifier that is decoupled and abstracted from the document ID. The key difference is this:

The user-facing document ID is treated as a subordinate identity. It is not the primary identifier.
The document ID does not have its own namespace.
The primary ID is owned by the software, not the business domain.
There is not a rigid 1:1 relationship between the document ID and the underlying primary ID.
Most importantly: The user-facing document ID field does not have its own column with a unique index constraint in a dedicated table. i.e. there is no master register of the user-facing document ID.

Without a unique namespace for the document ID, the system is fundamentally incapable of being a System of Record (source of truth).

The document ID that the user sees printed on a document is what can be called a Natural Key. Alternatively a GUID, for example, is what can be called a Surrogate Key. The problem is not the use of a database-level surrogate key as opposed to a natural key. The problem is a poorly designed surrogate key system where the surrogate key is an authority that is elevated above the document numbers printed on the documents – an inversion of authority. This flatly contradicts the real-world user experience of document identity. The result is identity confusion. In data modeling, if a surrogate key must be used then the surrogate key should be the Primary Key for internal indexing, but the natural key must have a Unique Constraint for business authority.

What this object-abstracted architecture (based on object-oriented principles) means is that the inherent real-world structure of controlled documents described earlier does not exist at the schema level. All the functionality required to make controlled document IDs behave as they should must be managed programmatically. Furthermore, the primary identity is owned by the software, not the organization.

One common manifestation of this architecture shortcoming is that document ID uniqueness is not natively enforced – a special script must be added to the application to enforce that essential rule. This is symptomatic of a system that was not designed with controlled document management in mind at the outset, and the underlying schema does not align with the real-world entity relationship structure of controlled documents. It lacks declarative integrity. A customized workaround configuration is needed to implement the most fundamental of controlled document management rules: unique user-facing document identity. By forcing uniqueness to be managed via application scripts instead of unbypassable schema-level unique constraints, the system invites data corruption and rejects its role as a System of Record.

In some cases there is a central document ID register of GUIDs (which are technical identifiers, not business identifiers) with a folder-based management model built on top of it. When user-facing identity becomes defined by a path rather than a unique single ID string, confusion arises. In these types of systems the resolving of a document is largely tied to where it is rather than what it is or who it is. For example, a document may appear to have been deleted when actually it has been 'moved'.

Systems which are built around location-centric folder-based-model management philosophies (even if the files are not actually stored in real folders) commonly create broken link, orphaned file, access denial and other problems which hamper users trying to locate information – exactly the types of pathologies described earlier. In a register-centric system (i.e. a system with a central lookup table or primary entity master table) the resolving of the document always comes back to who it is. The who that the database knows is exactly the same who that the user knows. Resolving something primarily by where it is, as opposed to what or who, flows from a pre-computers era mindset. Data systems transitioned from hierarchical models to relational models (i.e. SQL) in the 1980's. Hierarchical folders can be helpful visual aids, which is why they are commonly used. But as a data structure for managing controlled documents, or metadata in general, it is a poor map of reality.

The concept of a primary identifier that is abstracted from the document ID can be useful to help solve various document control problems, such as changing document IDs or resolving collisions or system mergers. But in general this architecture did not necessarily evolve as a solution to controlled document management problems. In most cases it probably arose as an outworking of architectures implemented as an object-oriented design approach as opposed to a relational model design approach, at a time when industry viewed object-oriented as the solution to every architectural problem. When an object-oriented design is implemented as an object-abstracted design and prioritizes developer convenience by abstracting the schema behind opaque surrogate keys it often inadvertently creates a 'black box' database that lacks the declarative integrity (e.g. natively enforced uniqueness) and semantic clarity (accurately mirroring the real world) required for long-term operational transparency and reliable data analysis. The system also becomes application-centric rather than data-centric. This is often referred to as an object-relational impedance mismatch.

Most documents on corporate networks do not fit into a rigid structured framework. They can be sorted, grouped, arranged and found using metadata and content searching. In some cases a schema was designed for that purpose, and then the requirements of controlled documents were constructed as an additional layer. This is one part of the problem. When combined with an object-abstracted schema model, the data's accessibility and transparency were inadvertently sacrificed, resulting in an architectural mismatch. This is why simple needs such as "I just want a single search field that will deterministically search all document IDs, of all types." can become problematic. If there is a single global document ID register then the implementation of this ask is trivial. But when the primary identifier has been abstracted away and there is no master document ID register at the schema level then functionality starts to diverge from expectations. The business identity has been abstracted to the point of operational dysfunction. The end result is a system that serves its own architecture rather than the people using the documents.

The problem gets worse if either:

The software 'owns' and rigidly dictates the document ID and users are forced to follow its system and use that ID as the ID printed on documents, or
For convenience reasons the organization decides to migrate existing document IDs to the software-generated system for the current generation of preferred software vendor, and then again some years later the next time the business changes software vendor.

The problem can be compounded even further by policy decisions such as dictating a metadata-structured ID system. For example, a numbering structure that reflects the organizational structure, which then changes whenever the organizational structure changes.

If the software uses a simple integer sequence number as the document ID, and document IDs are migrated to that system, then this will almost guarantee that collisions will occur as a result of a corporate merger for example. GUIDs take the opposite extreme for avoiding collisions. But as mentioned earlier, GUIDs are not suitable as user-exposed identifiers because they are not human-friendly.

Solutions which require documents to be changed, not to fix errors on the documents but to cater for constraints of the system, are symptomatic of a system that was not designed to mirror the real world.

Why This Matters

The benefits of the document register-centric approach are substantial, particularly in environments containing large volumes of long-lived technical documentation. Engineering drawing repositories, plant documentation systems, infrastructure records and asset information systems often contain hundreds of thousands of documents accumulated over decades. Some of those documents that are decades old can still be highly relevant and valuable. In these environments document numbers are the primary navigation keys used by engineers, operators, maintainers and contractors. Preserving the integrity of those references is vital to the end users served by the document management system.

A document register-centric model provides:

Consistent Retrieval

Users search using the document number they see on the document.
No special knowledge is required.
No alternate search fields are required.
No understanding of historical numbering changes is required.
No understanding of fragmented register systems is required.
No understanding of a structured key document ID system is required.
No dependency on text contained within the file is required.
No dependency on knowing 'where' the document is located is required.
The search will always be deterministic.

Reliable Cross-Referencing

Every cross-reference printed on every document points to a unique, identifiable document.
The relationship between documents remains clear regardless of system migrations or technology changes.
Every document is identified by a single unique identifier.

Elimination of Metadata Workarounds

There is no need for legacy identifier fields, proxy identifiers or numbering translation layers.
The system manages document identity directly rather than attempting to compensate for identity problems indirectly.

Improved Long-Term Integrity

The document set remains self-consistent even if the management system changes.
Documents retain their meaning and relationships independently of any particular software platform. This is particularly important for asset information that must remain usable for decades.

Alignment with User Expectations

Most importantly, the system behaves the way users naturally expect it to behave.
The identifier printed on the document is the identifier used everywhere.
The release identifier sequences continue to progress according to the existing system. There is no surrogate release number. The series does not get broken.
No translation is required between what users see and what the system stores.
There are no requirements imposed on users that are artifacts of misalignments between the software data schema and how the real world actually works.
There is no requirement to revise and re-publish documents purely to cater for constraints of the software.

Integrity-Driven Compliance

Compliance is driven by the underlying system integrity and is less reliant on process-level controls and audit trails. Compliance matters. It should be the natural result of an immutable register, not an add-on layer of scripts and logs.

Under this model, document identity is derived from the documents themselves, not from an internal software construct. The document identifier is also expected to be immutable. An ID may be canceled and superseded with an associated new published document that bears the new ID, but it cannot be changed or substituted with a surrogate.

Note that the document ID is distinct from the file ID. File IDs are not what is cross referenced from other documents. File IDs do not (or should not) appear on the printed document. Hence files can be assigned any arbitrary unique identification system, including sequence numbers or GUIDs. Referencing of file IDs only occurs within the controlled document management system as an object subordinate to the document ID. This distinction between the file and the abstract document is a critical concept.

Conclusions

A document register-centric system recognizes the primacy of the documents. The system exists to support management of controlled documents.

This distinction between object-abstracted (or file or folder-centric) and document register-centric fundamentally changes how document identity, cross-referencing, revision control and information retrieval are managed.

In a document register-centric model the management system does not define and own document identity or release identifier designations. It recognizes and preserves the identity that already exists within the documents themselves, or whatever the user wants to define it as. This single principle eliminates many of the inconsistencies, workarounds and user frustrations that have become accepted as normal within the document management industry.

A user-centric design prioritizes the focal endpoint of any document management system: the document consumer who must find and apply the document.

The quality of a controlled document system should be evaluated primarily by how reliably document consumers can identify, locate, trust, and use documents using the identifiers and relationships that exist within the document set itself.

Silkwood Software

July 2026