Writing by Peter Hilton

Internationalise data

Better software through commoditised domain modelling 2022-01-18 #data #DDD

Nareeta Martin

Internationalisation doesn’t only apply to text and user interfaces. Software products can also improve their reliability and usability by internationalising their data.

Data internationalisation typically involves adopting an international data representation standard for formatting or storing data values (or both). These standards occupy a spectrum from universal applicability to highly domain-specific obscurity.

Notable examples

📆 ISO 8601 - date and time data interchange standard - specifies date and time formats. Use these to format dates as machine-readable dates in text serialisation formats such as JSON. These formats also format dates unambiguously for people to read, although most people prefer some region-specific local format.

🧧 ISO 4217 - currency codes - identifies various currencies. Wikipedia notes that while the ISO standard doesn’t specify how to format currency amounts, the European Union’s Publication Office specifies using the currency code and the amount, separated by a non-break space. The order depends on the language, e.g. 42 EUR EUR 42.

☎️ E.164 - The international public telecommunication numbering plan - formats international telephone numbers using a plus sign, followed by up 15 digits, with no spaces or punctuation. Use this format as a canonical storage format, and accept it for user input, in case someone copy/pasted from this format

🧧 IBAN - International Bank Account Number - identifies bank accounts in Europe and various other countries, but not China, India or the US.

📕 ISBN - International Standard Book Number - identifies published books. Don’t confuse ISBN with the similar 🍷 ISWN, 📰 ISSN, and 🎼 ISMN identifiers.

📦 EAN - International/European Article Number - more generally identifies retail products, and appear on product packaging as the familiar bar codes.

Benefits

Unless you enjoy data migration projects, or you find yourself in the improbable position of producing software that will only ever serve one geographic region, internationalise your data. Making your data suitable for multiple regions reduces the risk of headaches when you focus on new geographies.

More importantly, internationalisation includes more people who would otherwise not successfully use your product. People move around around the world, both temporarily and permanently. Despite their frequent Western bias, international standards tend to deliver better inclusivity than naive local designs.

More subtly, third-party standards let you benefit from other people’s documentation, and predefined validation rules. This commoditises some of your domain modelling so you can focus on solving problems for your customers.

Values and identifiers

Dates and currency amounts, such as 2021-12-29 and EUR 42 represent values. The other ‘codes’ and ‘numbers’ represent unique identifiers. These both enable software interoperability, but in slightly different ways.

Using ISBN to identify books lets you work with other systems’ book data, so you can find data about the same book as one in your system, without mixing up different books, such as different editions of the same title. Standard identifiers make data comparable between separate systems.

Using ISO 8601 date formats provides a more basic benefit: lower implementation costs and fewer bugs when exchanging date data between systems in the first place, rather than specifically comparing dates. Standard value formats reduce the cost of reliable data exchange.

Share on BlueskyShare on XShare on LinkedIn