FUNDAMENTALS OF CONTROLLED DOCUMENT MANAGEMENT

WHAT IS A CONTROLLED DOCUMENT

Controlled documents have characteristics that differentiate them from other types of documents. The management of these types of documents are governed by strict rules, often dictated by regulatory or compliance obligations. Examples of controlled documents include standards, procedures and policies.

The following are typical key characteristics of controlled documents.

  1. Review and approval. All controlled documents are subjected to review and approval. A document must be formally approved before it can be released and used.
  2. Formal issuance and withdrawal. Controlled documents are formally issued for use, and withdrawn from use when no longer valid.
  3. Currency status and distribution control. All controlled documents must have a currency status. That is, a document may have a status of current, superseded, obsolete or cancelled for example. Controlled documents that are no longer current must be withdrawn from use so that they are not inadvertently used.
  4. Issue history. Different issues or releases of a controlled document must be uniquely identified (e.g. with a revision or version number).
  5. Directive authority. Controlled documents, such as standard operating procedures for example, typically carry directive authority. That is, the document gives instructions that must be followed.
  6. Periodic review. Directive authority type controlled documents, such as standards, procedures and policies, must be reviewed on a periodic schedule and revised and re-issued as appropriate. Engineering drawings are a class of controlled document but are generally not subjected to review on a time-based schedule but are revised based on other triggers.
  7. Unique identification. Controlled documents must be uniquely identifiable. Generally this is via a document ID, being a string of characters, a sequence number or a combination thereof.

Effectively managing controlled documents requires a system specifically designed to handle these unique characteristics.

ESSENTIAL REQUIREMENTS OF A CONTROLLED DOCUMENT MANAGEMENT SYSTEM

Structure and Relationships

The characteristics of controlled documents dictate that they must be managed in a system which has structure. Furthermore, the management of relationships between different entities must be embodied within the structure. For example, the different releases of a particular document are all related to the same document ID. At the same time, the document ID must be unique.

To effectively manage these relationships a robust data management platform is essential. Relational databases are the most appropriate platform for this purpose. A relational database natively enforces referential integrity and manages relationships. To effectively manage controlled documents in a relational database the database structure must be designed to mirror the real-world structure of how controlled documents must be managed.

Document Identification

A fundamental rule of controlled document management systems (CDMS) is that each document must be uniquely identified. The CDMS must enforce the uniqueness of each document ID. The most effective way to implement this requirement is via a single register of document IDs. Hence the first requirement of a CDMS is that it must contain a single register of document IDs, with an associated constraint that all entries must be unique.

The document ID which must be unique is that which is displayed on the published document. Document IDs are cross-referenced in the text of documents, and they must be able to be readily verbally communicated and transcribed, i.e. they must be human-friendly. For this reason GUIDs (globally unique identifiers) are unsuitable for use as document IDs, even though they can be guaranteed to be unique without needing to be managed in a single register. The identifiers of national and international standards (e.g. ISO 9001, IEC 60167, IEEE 1584, ANSI Z535.1) are good examples of document IDs.

Conceptually a document ID is an abstract entity separate from the files which represent the actual document object. With a dedicated document ID register the function of reserving document IDs, for example, is straightforward. There is no need to supply or create dummy files in order to reserve document IDs.

For various reasons (such as inflexibility, explained in more detail elsewhere on the web) embedding metadata into document IDs is poor practice. The advent of relational databases many decades ago obviated the need to use a structured document ID as the primary means of searching, sorting and grouping documents. The only structure which may be applicable is one which indicates the namespace or originating register (e.g. ISO, IEC, IEEE, ANSI), ensuring its unique context.

Despite the fact that metadata should not be embedded within a document ID, apart from enforcing uniqueness a CDMS should not impose constraints on the format of the document ID. Most organizations will use some type of document ID format. The organization should be able to decide on the document ID format independently of the CDMS. The CDMS should also maintain only a single document ID register and not maintain a parallel or proxy document ID register. To guarantee that no duplicate document identifiers can be created, the CDMS must not maintain different registers for different document types and hence allow the configuration of duplicate identifiers across different registers.

Relationship Between Documents and Files

While enforcing the uniqueness of the document ID, the CDMS must also cater for the relationship structure that a single document ID is related to a series of releases. Within each release there will typically be two renditions. For example, there will be a source file (e.g. .docx or .dwg) and a published file (.pdf). Theoretically, the published rendition of a single release may comprise multiple files, such as a main document and a separate appendix file in a different format. The document ID is not a metadata attribute of the file. The file is related to the document ID. The document ID is the parent and the file is the child. This logical separation of the document ID from the physical files is critical, allowing for flexible management of various renditions and formats over time. The CDMS must be flexible enough to handle these requirements and mirror the real-world nature of the data. Relational databases are specifically designed to efficiently manage these types of relationships.

User Interface and Control

Document search results must be presented to end users in such a manner that users are not likely to inadvertently use a document which is not the latest release of a current document. Documents which have been cancelled, or releases which have been superseded, should not be presented without additional deliberate actions by the user or perhaps specific authorization. The most efficient way to handle these requirements is via metadata.

Given that users must be able to download document files for various reasons, a strategy must be applied to reduce the risk of non-current documents being used. The simplest and most efficient strategy for ensuring that users are only referring to the current release of a document is to explicitly communicate on each document that the master copy resides in a given nominated single repository. Documents viewed through channels outside of that repository are defined as uncontrolled by default. Thus, the designation of what constitutes a 'controlled copy' is not by what is displayed on the document, or whether or not it is printed, but by the user interface channel through which it is being viewed.

Metadata

Metadata is used in a CDMS for both controlling documents and also for searching, sorting and grouping documents and communicating information.

Some metadata is related to the document ID and some is related to the file. The metadata which is related to the document ID should only be that which does not change from one release to the next. For example, the title and subject matter would generally be related to the document ID, remaining constant across releases, while the specific release number, author and release date would be related to the file issued for a particular release. These relationships reflect the logical structure of the real-world data. The two are combined in search results data and the fact that they are stored separately should be invisible to the end user.

Release identifiers are variously referred to as version or revision number. (Engineering drawing releases are generally referred to as revisions, and pre-release copies as versions, but the terminology is often reversed in other contexts.) Organizations will have different policies as to how releases are identified. While typically referred to as 'numbers', release identifiers can sometimes be letters, or even dates. The CDMS should not constrain how release identifiers are formatted. The set of release numbers for a given document ID must also remain as a single unified set, for example across changes in file type of the underlying source files.

Basic principles of metadata management dictate that document control metadata must be related to files, not embedded within the file. For example, the status of a file as current latest release or superseded should be by linking a reference to the file to an element in a lookup table. The status is changed by changing the link table without accessing or modifying the file, and the status is read without needing to read the file.

In keeping with basic data structure principles, the metadata should be in a normalized structure. This ensures data integrity and efficiency by storing each element of data only once, referencing it via keys across related tables.

Storing Files

The principle of ensuring that data remains consistent with the associated files is that of transactional integrity. Specifically, the transactions must be ACID-compliant. Otherwise the data can readily become corrupted. A relational database engine can guarantee transactional integrity, but only if the binary content of the file is under the full control of the database.

The approaches to storing files and associated metadata have evolved over time. The most recent development is cloud object storage. Cloud object storage offers extreme levels of scalability and other benefits, but has the drawback of compromising transactional integrity compared with frameworks where the file binary content is under the full control of the database. For some solution needs transactional integrity is less important. For a CDMS solution however the integrity of the data is critical. The prospect of users being presented with an obsolete document, or a sensitive and restricted document, due to data corruption caused by a failed transaction for example, would not be acceptable. Transactional integrity with cloud object storage can be managed by the application layer, but it inherently can never be as robust as where it is entirely managed within the relational database engine.

Ultimately a CDMS user must be able to extract a document file from the system. As with document IDs, the CDMS should not constrain the user or organization with regards to how extracted files are named. The filename should generally simply match that given when the file was uploaded.

A SHORT HISTORY OF FILE STORAGE AND RELATIONAL DATABASES

The binary content of files can be stored directly within a binary column within a database table (referred to as a Binary Large Object – BLOB). This approach tends to create scalability and maintainability issues however. Conversely, when file binary data is stored external to the database the database loses its inherent ability to guarantee transactional integrity. If the data is stored external to the database then the application layer must assume responsibility for managing transactional integrity. In 2007/8 Oracle and Microsoft introduced solutions to this problem with SecureFiles and FILESTREAM respectively.

However SecureFiles and FILESTREAM still present limitations in the context of extremely large and distributed architectures. Hence for very large scale and distributed requirements (e.g. petabyte scale volumes), cloud object storage (such as Azure Blob Storage, Amazon S3 and Google Cloud Storage) have been developed as a more cost-effective solution. These solutions allow multi-tenanted, centrally managed storage for any scale, small or large.

The drawback of cloud object storage is that transactional integrity with an associated relational database must be mediated by an application layer. For solutions where transactional integrity is important, cloud object storage is inherently less robust in its integrity guarantees than options where the file binary data is under the full control of the relational database engine. Cloud object storage also introduces a security framework which is separate from and in addition to that of the relational database. All these challenges are manageable, but the solutions entail compromises. A security and integrity framework comprising an encapsulated single security model with transactional integrity fully assured, all within a single system without involvement of an application layer, is not possible with cloud object storage. However this framework is possible with a solution based on a relational database which retains full control of the file binary data content.

CONTROLLED DOCUMENT MANAGEMENT SYSTEM ARCHITECTURE

Scenarios

Below is a list of scenarios in day-to-day use of controlled document management systems. Some of these are very common.

Do these pathologies derive from implementation shortcomings, do they stem from software architectural limitations, or are they a combination of factors? Before answering that question we first need to take a closer look at how controlled documents work – separately from the question of the software or processes that manage them.

The Real-World Structure of Controlled Documents

Controlled documents are a special class of document that are governed by particular rules. The outworking of those rules is that controlled documents inherently form a hierarchical structure of data and objects. Below is a conceptual structure example of controlled documents. This is unrelated to the software that might be used.

    Document Number [2415-004-C-102]
    │
    ├── Document Metadata (Title, Subject, etc.)
    │
    ├── Rev 2
    │   ├── Release Metadata (Date, Author etc.)
    │   ├── PDF Rendition File
    │   └───── CAD Source File
    │
    ├── Rev 1
    │   ├── Release Metadata
    │   ├── PDF Rendition File
    │   └───── CAD Source File
    │
    ├── Rev 0
    │   ├── Release Metadata
    │   ├── PDF Rendition File
    │   └───── CAD Source
    │
    ├── Draft Rev B
    │   ├── Release Metadata
    │   ├── PDF Rendition
    │   └───── CAD Source
    │
    └── Draft Rev A
        ├── Release Metadata
        ├── PDF Rendition
        └───── CAD Source

In the above example a cross reference to 2415-004-C-102 inherently points to '2415-004-C-102 Rev 2 PDF Rendition File + Rev 2 Release Metadata + Document Metadata' (unless it refers to a specific revision, in which case it would read '2415-004-C-102 Rev 1'). This is important. A cross reference to a drawing or a procedure does not refer to a static file. It refers to the latest release, in the published or authoritative version format, of that series. When a new release is published the 'pointer' logically moves and refers to a different object. The document is not a tangible object as such. It is conceptually an abstract container comprising a set of objects. Rules apply to those objects, in particular their lifecycle state. There might be a Revision 3 for example which is still in draft form and hence is not yet current.

The above structure is quite different from that of arbitrary files that are typically found in corporate network folders, or files related to enterprise content management tasks such as invoices, correspondence or customer records. Files for purposes other than that described above are required to obey different sets of rules.

Mapping From The Outside In

The internal representation within the software system must somehow map the structure and functionality of controlled documents described earlier. The most effective approach will be one where the key business rules and entity relationships are inherently natively mapped by the underlying schema. The greater the difference between the database schema and the real-world relationships the greater will be the burden on programmatic layers to fill the gaps. Complexity proliferates. There is a point however where no amount of programming can fill the gap. This is why in some cases some seemingly simple user expectations (e.g. "Give me a Document-ID-Equals search function that will return exactly one result") ultimately become intractable because the underlying schema does not support it.

Hence the key design strategy requirement is this:

The system must be a faithful reflection of reality.

By 'reality' we mean the business logic model – the rules and implied expectations that the system needs to abide by.

The system does not generate reality – its job is to map it. Having a perfect schema doesn't guarantee that the user interface will work. What it guarantees is that it can be made to work. If the schema does not match reality however, then potentially the user interface can never be made to work how it should.

Critical Success Factors

Controlled document management systems exist to support outcomes in the real world. For example: People have access to the right procedures to follow to keep them safe. Designers have access to company standards so they can produce consistent designs to the required quality. The Finance team has a clear set of governance rules to work from. Workers can readily find accurate asset drawings essential for performing a task. Fabricators have the right drawings to manufacture the correct part or equipment.

The purpose of audits and certifications is to support ensuring that the real-world outcomes are not compromised by shortcomings in the system that delivers critical documents to people. The purpose of the system is not to pass an audit. The purpose of the system is to support real world outcomes. The purpose of the audit is to assess the system's effectiveness.

The end focus of a document management system is the users who read and apply the documents – document consumers. Without document consumers the entire system is superfluous. If the needs of document consumers are not being met then the system fails in its primary purpose. The most common manifestation of this is that users bypass the system entirely and create their own satellite repositories of documents – not because the users are lazy or untrained but because the system is ineffective.

Hence a key success criteria for a controlled document management system is this:

The system must serve the people using the documents.

That being the case, the most basic requirement of people using documents is this:

I need document ABC-123. Find it.

If the user becomes frustrated with the response to that simple need, then the system is not serving the users. They lose trust. Further, if the users expect that Revision 6 should be followed by Revision 7, then a software migration that results in Revision 6 being followed by Version 1 is not serving the users. If users expect that a document should only be known by one document ID, being that displayed on the document and cross references in other documents, yet within the management system it is known by three different IDs, then the system is not serving the users.

Consumption-Centric Design

A document consumer-centric design gives precedence to the following question:

How does a user find the document?

Ensuring this requirement is met may appear trivial. With the right architectural model and design philosophy it is trivial. But the common pathologies listed earlier demonstrate that the primacy of this requirement is often lost sight of. The needs of the document consumer appear to be an afterthought and are subordinated to other considerations, usually workflow control or collaboration related, or perhaps ease of development interests.

Overall, the question of effective consumption-centric design is not just about achieving positive document consumer end user buy-in. It is about avoiding exactly the types of incidents that a certified and audited document management system is supposed to prevent.

Bad outcomes occur in the real world when:

But most seriously, a poor system can force users to either make assumptions or make errors. Assumptions and errors can be precursors to accidents.

Cognitive Load

One of the purposes of software systems is to reduce cognitive load. If the solution to a problem is that users need to be educated then the system is increasing cognitive load, not reducing it.

Here is an example: User finds a cross reference to a document. They perform a search using that ID and it returns a long list of documents. After some time spent searching through the listed documents they eventually ask a colleague and discover that the document they were looking for had been made obsolete. None of the documents returned were the document in question, they were other documents that merely still contained cross references to the obsolete document. The user wanted to search by document ID, but instead the system entered the string into a fuzzy full-text content search. The correct response to the query should have been either zero records or a record indicating that the document was obsolete.

Patterns

The scenarios described earlier might be viewed as isolated issues arising from search configuration, migration activities, user training, metadata quality, or individual implementation decisions. While these factors can contribute, such issues frequently occur together and persist across different systems, organizations, different products and technology generations. But there is a common theme.

All of the complaints and scenarios listed earlier can be summed up in one statement.

The behaviour of the system does not align with reality.

That is the pattern.

Hence the summary of the problem is this:

Reality is consistent. Some systems are better than others at aligning with reality.

The users are not misguided. Their expectations are not obscure or unreasonable. They don't need to be educated. The problem is that the system simply does not match the reality of how controlled documents are supposed to work.

Root Causes

All of the pathologies described earlier can be traced back to one or more of three causes related to the software.

An example of a system-centric approach is a notion that if the system is certified and passes an audit then all is good. User complaints are a distant secondary consideration to auditor complaints. The system might superficially appear to be fully compliant but hidden edge cases exist which breach core controlled document management principles. Or perhaps the system and all its content is in perfect order but the user experience is so terrible that end users go out of their way to bypass the system entirely.

Most document management systems are workflow and collaboration-centric, or file sharing-focussed. These systems prioritize the process over the product. The system is designed for high velocity, highly rigorously controlled creation and editing of documents. The system is explicitly designed to prevent users from following a different workflow management system outside of the document management system, even if it could actually be more efficient. Sometimes the system is also designed to prevent users from having control over key metadata such as the release identifier, e.g. whether it is a number or letter or both, whether it is called a Revision or a Version, or where it starts, or whether it is allowed to change format mid-series from letter to number sequence. Sometimes a proxy field might be offered as a workaround, sometimes not. Typically, ease of editing collaboration is the key focus, and the ability of end-users to effectively find what they need, and that the behaviour of the system will match user's expectations of how the business rules should work, is just assumed to be taken care of. But often it is not. Many document control systems are essentially collaboration and file sharing tools pretending to be document registers or controlled document management systems. They are file catalogues with workflow engines.

The third root cause of controlled document system pathologies is deeply architectural, but it can be identified very simply: there is no document register table. This root cause revolves around a question of identity.

Commonly the document ID is treated as merely a peer metadata field alongside other metadata. The primary identity is not the document ID that the user deals with. The primary identity is an object identifier that is decoupled and abstracted from the document ID. The key difference is this:

Without a unique namespace for the document ID, the system is fundamentally incapable of being a System of Record (source of truth).

The document ID that the user sees printed on a document is what can be called a Natural Key. Alternatively a GUID, for example, is what can be called a Surrogate Key. The problem is not the use of a database-level surrogate key as opposed to a natural key. The problem is a poorly designed surrogate key system where the surrogate key is an authority that is elevated above the document numbers printed on the documents – an inversion of authority. This flatly contradicts the real-world user experience of document identity. The result is identity confusion. In data modelling, if a surrogate key must be used then the surrogate key should be the Primary Key for internal indexing, but the natural key should have a Unique Constraint for business authority.

What this object-abstracted architecture (based on object-oriented principles) means is that the inherent real-world structure of controlled documents described earlier does not exist at the schema level. All the functionality required to make controlled document IDs behave as they should must be managed programmatically. Furthermore, the primary identity is owned by the software, not the organization.

One common manifestation of this architecture shortcoming is that document ID uniqueness is not natively enforced – a special script must be added to the application to enforce that essential rule. This is symptomatic of a system that was not designed with controlled document management in mind at the outset, and the underlying schema does not align with the real-world entity relationship structure of controlled documents. It lacks declarative integrity. By forcing uniqueness to be managed via application scripts instead of schema-level unique constraints the system invites data corruption and rejects its role as a System of Record.

In some cases there is a central document ID register of GUIDs (which are technical identifiers, not business identifiers) with a folder-based management model built on top of it. When user-facing identity becomes defined by a path rather than a unique single ID string, confusion arises. In these types of systems the resolving of a document is largely tied to where it is rather than what it is or who it is. For example, a document may appear to have been deleted when actually it has been 'moved'.

Systems which are built around location-centric folder-based-model management philosophies (even if the files are not actually stored in real folders) commonly create broken link, orphaned file, access denial and other problems which hamper users trying to locate information – exactly the types of pathologies described earlier. In a register-centric system (i.e. a system with a central lookup table or primary entity master table) the resolving of the document always comes back to who it is. The who that the database knows is exactly the same who that the user knows. Resolving something primarily by where it is, as opposed to what or who, flows from a pre-computers era mindset. Data systems transitioned from hierarchical models to relational models (i.e. SQL) in the 1980's. Hierarchical folders can be helpful visual aids, which is why they are commonly used. But as a data structure for managing controlled documents, or metadata in general, it is a poor map of reality.

The concept of a primary identifier that is abstracted from the document ID can be useful to help solve various document control problems, such as changing document IDs or resolving collisions or system mergers. But in general this architecture did not necessarily evolve as a solution to controlled document management problems. In most cases it probably arose as an outworking of architectures implemented as an object-oriented design approach as opposed to a relational model design approach, at a time when industry viewed object-oriented as the solution to every architectural problem. When an object-oriented design is implemented as an object-abstracted design and prioritizes developer convenience by abstracting the schema behind opaque surrogate keys it often inadvertently creates a 'black box' database that lacks the declarative integrity (e.g. natively enforced uniqueness) and semantic clarity (accurately mirroring the real world) required for long-term operational transparency and reliable data analysis. The system also becomes application-centric rather than data-centric. This is often referred to as an object-relational impedance mismatch.

Most documents on corporate networks do not fit into a rigid structured framework. They can be sorted, grouped, arranged and found using metadata and content searching. In some cases a schema was designed for that purpose, and then the requirements of controlled documents were constructed as an additional layer. This is one part of the problem. When combined with an object-abstracted schema model, the data's accessibility and transparency were inadvertently sacrificed, resulting in an architectural mismatch. This is why simple needs such as "I just want a single search field that will deterministically search all document IDs, of all types." can become problematic. If there is a single global document ID register then the implementation of this ask is trivial. But when the primary identifier has been abstracted away and there is no master document ID register at the schema level then functionality starts to diverge from expectations. The business identity has been abstracted to the point of operational dysfunction. The end result is a system that serves its own architecture rather than the people using the documents.

The problem gets worse if either:

The problem can be compounded even further by policy decisions such as dictating a metadata-structured ID system. For example, a numbering structure that reflects the organizational structure, which then changes whenever the organizational structure changes.

If the software uses a simple integer sequence number as the document ID, and document IDs are migrated to that system, then this will almost guarantee that collisions will occur as a result of a corporate merger for example. GUIDs take the opposite extreme for avoiding collisions. But as mentioned earlier, GUIDs are not suitable as user-exposed identifiers because they are not human-friendly.

Solutions which require documents to be changed, not to fix errors on the documents but to cater for constraints of the system, are symptomatic of a system that was not designed to mirror the real world.

Why This Matters

The benefits of the document register-centric approach are substantial, particularly in environments containing large volumes of long-lived technical documentation. Engineering drawing repositories, plant documentation systems, infrastructure records and asset information systems often contain hundreds of thousands of documents accumulated over decades. Some of those documents that are decades old can still be highly relevant and valuable. In these environments document numbers are the primary navigation keys used by engineers, operators, maintainers and contractors. Preserving the integrity of those references is vital to the end users served by the document management system.

A document register-centric model provides:

Consistent Retrieval

Reliable Cross-Referencing

Elimination of Metadata Workarounds

Improved Long-Term Integrity

Alignment with User Expectations

Integrity-Driven Compliance

Under this model, document identity is derived from the documents themselves, not from an internal software construct. The document identifier is also expected to be immutable. An ID may be cancelled and superseded with an associated new published document that bears the new ID, but it cannot be changed or substituted with a surrogate.

Note that the document ID is distinct from the file ID. File IDs are not what is cross referenced from other documents. File IDs do not (or should not) appear on the printed document. Hence files can be assigned any arbitrary unique identification system, including sequence numbers or GUIDs. Referencing of file IDs only occurs within the controlled document management system as an object subordinate to the document ID. This distinction between the file and the abstract document is a critical concept.

Conclusions

A document register-centric system recognizes primacy of the documents. The system exists to support management of controlled documents.

This distinction between object-abstracted (or file or folder-centric) and document register-centric fundamentally changes how document identity, cross-referencing, revision control and information retrieval are managed.

In a document register-centric model the management system does not define and own document identity or release identifier designations. It recognizes and preserves the identity that already exists within the documents themselves, or whatever the user wants to define it as. This single principle eliminates many of the inconsistencies, workarounds and user frustrations that have become accepted as normal within the document management industry.

A user-centric design prioritizes the focal endpoint of any document management system: the document consumer who must find and apply the document.

The quality of a controlled document system should be evaluated primarily by how reliably document consumers can identify, locate, trust, and use documents using the identifiers and relationships that exist within the document set itself.

Silkwood Software

July 2026