Living glossaries and data dictionaries

Reflections on ‘Living Documentation’, by @cyriux 2021-07-06 #documentation #book

Living Documentation - book cover

The book Living Documentation, by Cyrille Martraire explores the concept of living documentation. Chapter 6 introduces living glossaries as well as two other examples of living documentation.

Identifying core concepts

The core recommendation for a living glossary, like examples from previous chapters, requires extracting knowledge from the software:

Extract the glossary of the ubiquitous language from the source code. (p159)

Ubquitous language refers to the subject-matter terminology whose importance Domain-Driven Design popularised. This trend created a mainstream software development practice of using problem domain terminology in code, so that when the code refers to an order, say, this means precisely what it means when domain experts use the word order.

This approach implies that developers will define domain vocabulary in code comments. So while the idea makes sense, it won’t always work in practice. One does not suggest that programmers should write comments lightly.

Optimising for writers

The wisdom of Martraire’s approach lies in optimising for the writers. Over the years, my personal software development context has changed several times, and I have changed perspective along several dimensions:

My past context	My present/future context
Organisation focus	Customer focus
Professional services	Product development
Developer-only teams	Cross-functional teams with product managers and designers
One small development team	Multiple collaborating product teams
Developers write, maybe	Technical writers write

While Martraire’s approach would have suited my past context, I now work with non-coding writers who cannot easily edit prose embedded in code. Meanwhile, product designers and other non-coders use terminology before coders do, in feature briefs and design mock-ups. In this context, optimising for writers means something else.

Glossary-first development

I currently use a collaboratively-maintained domain glossary, which represents a single source of truth for various artefacts, such as content marketing and source code. This takes the form of a Notion database, whose entries we tag with bounded contexts that indicate their ownership.

We limited our glossary’s scope to terms the user interface uses, for maintainability. This user-interface centric approach aligns well with customer focus. Unless we broaden this scope to content marketing, the software still uses all of the terms.

We could therefore build a reconciliation mechanism to identify missing glossary entries. The software build could use the Notion API to create blank glossary entries for undefined core concepts used in the code. That would notify the glossary maintainers (product managers) that a term needs defining.

Data dictionary documentation

We used to call the domain glossary a data dictionary - part of a system’s data model documentation. Terminology and development methods have changed since then; the value of capturing this kind of knowledge has not.

Meanwhile, modern API documentation often fails to explain what the data means. The Spotify Web API Object Model offers an atypical counterexample, documenting its jargon such as danceability:

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.

In this product API context, you no longer need to extract documentation from the code; the glossary already exists as part of the product. The ultimate fusion of developer experience, user experience and domain language elevates the glossary to product component status.

Writing by Peter Hilton

Identifying core concepts

Optimising for writers

Glossary-first development

Data dictionary documentation