Michael's unofficial guide to `ruby-i18n/i18n`

As of 2022-08-30, ruby-i18n/i18n is the ninth most popular Ruby gem.

Yet, its documentation, performance, and code quality leave a lot to be desired. While I contribute bug reports and fixes whenever I can, I found I needed a place to jot down the issues, history, and surprises I’ve discovered while working with this library.

This document will serve as a place for me to collect my notes about the library. It may or may not end up being valuable to others.

Features
- Backends
Interpolation
Global configuration options
Lookup options
Internal options
Other bits
- i18n reserved key

Features

Backends

Backends are bits of composable functionality that are meant to be combined together using include on the Simple backend:

I just use the Fallbacks and Pluralization backends:

I18n::Backend::Simple.include(
  I18n::Backend::Fallbacks,
  I18n::Backend::Pluralization,
)

Interpolation

i18n supports two different interpolation syntaxes:

Modern: %{} syntax (e.g. greeting: Hello %{name}!)
Ancient, Ruby 1.9 strformat syntax (e.g., greeting: Hello %<name>.d) # TODO, make the examples match; link to strformat

Always use the former.

Locale codes

Locale codes can be pretty much anything, but the most common form is an ISO 639 language code (e.g., en for “English”) optionally followed by an ISO 3166 region code (e.g., en-CA for “English, as written in Canada”).

But there are other forms that are also very common.

For example, it’s common to have locale codes contain a script:

zh-Hant-TW (“Chinese, written in the Traditional Chinese script, as written in Taiwan”)
uz-Cyrl (“Uzbek, written in Cyrilic script”)
uz-Arab-AF (“Uzbek, written in Arabic script, as written in Afghanistan”)

ISO 15924 defines the valid script codes, and CLDR defines the default script as Latn (i.e., Latin script).

IETF RFC 5646 / BCP 47 defines “Language tags” as having up to 6 segments:

langtag = language
          ["-" script]
          ["-" region]
          *("-" variant)
          *("-" extension)
          ["-" privateuse]

In practice, you’ll see all sorts of fun locale codes requested by users, both valid and invalid (e.g., en-PIRATE) and the library will need to determine which locale is best locale to begin your search for translation data from. CLDR defines an algorithm and data to help with this determination.

Fallbacks

There are thousands of locales that a global company will see users requesting, and they can generally be thought of forming a hierarchy starting at the most specific to the most general:

ca-ES-VALENCIA (“Catalan, as written in Valencia, in Spain” aka. Valencian)
ca-ES (“Catalan, as written in Spain”)
ca (“Catalan”)

You won’t have perfect localization data for every locale at every level of granularity. In order to gracefully degrade the UI, you want to provide something “good enough” for the user when you don’t have data matching their specific locale. This is done by defining locale fallbacks.

Even if you had perfect and complete data for every locale, there is so much overlap in the data between locales and their ancestors, that you wouldn’t want to keep n copies of the data in memory. By pushing all the duplicated data into the most general ancestor possible, you can save a lot of memory and disk space, at the cost of increased complexity and CPU use in the lookup code.

`root` locale

CLDR defines a root locale that is the ancestor of all locales (and thus contains data that is common to many locales).

Ancestors

Note that in many cases, finding the ancestor of a locale can be done naively done by chopping off the last segment of the locale (i.e., everything after the last -) However, there are many examples where this is not the case, and “hyphen chopping” gives the wrong result.

The fallback chain for en-CH is:

en-CH (“English, as written in Switzerland”)
en-150 (“English, as written in Europe”. Note that 150 is defined by UN M.49, not ISO 3166)
en (“English”)
root

The fallback chain for zh-Hant-TW is:

zh-Hant-TW (“Chinese, written in the Traditional Chinese script, as written in Taiwan”)
zh-Hant (“Chinese, written in the Traditional Chinese script”)
root

Note that we don’t fall back to zh here, since zh is written in Simplified Chinese characters, while zh-Hant is written using Tradition Chinese characters. So it wouldn’t make sense to fall back to zh in this case because you’d be switching scripts if you did.

The Fallbacks backend simply falls back through each ancestor of the locale until the lookup succeeds. Which locales to fall back through for any given locale is defined by I18n.fallbacks.

CLDR provides data that can be used by I18n.fallbacks to determine the fallback chain.

Pluralization

The way that words are pluralized varies dramatically across locales.

There are currently six different pluralization keys defined in CLDR: few, many, one, other, two, and zero. Each language uses a different subset of the keys for their grammars, and each language will use any given key following different rules from one another. All languages use at least the other key (e.g., if the language does not have a concept of pluralization (e.g. zh, ja, etc.), then it will use the other key for all counts)

CLDR provides data describing the keys used by the locale, and the rules to decide which key to use for any given count.

Common mistake: the zero, one, and two keys do not align with counts of 0, 1, and 2, but rather any number that behaves like those numbers grammatically in the language. For example, some locales use one for numbers that end in “1” (eg 1, 21, 151) but that don’t end in 11 (like 11, 111, 10311).

In ruby-i18n/i18n, pluralization is handled by the Pluralization backend. It expects to find a Proc describing the pluralization rules of the locale at i18n.plural.rule. It then calls the Proc with the count to get the pluralization key to use in the lookup.

Lateral Inheritance

Not all pluralization contexts will have all of the keys needed for the rules of the locale. They could be missing, or excluded to reduce redundancy. If a key is missing, the lookup should attempt to use the other key. If the other key is also missing, only then does the lookup fall back to an ancestor locale.

This is part of what CLDR calls “Lateral Inheritance”. It’s a little more complicated than I’ve described (dealing with genders and cases, as well as pluralization).

`Symbol` resolving

When the value found in a lookup is a Symbol, ruby-i18n/i18n will perform another lookup with that Symbol as the key, and return that instead.

This is how they accommodate CLDR’s concept of aliases.

If you don’t want this behaviour, and expect your lookup to return a Symbol, set resolve=false as an option on the lookup.

`Proc` resolving

When the value found in a lookup is a Proc, it will call the Proc with the same options given to the lookup.

If you don’t want this behaviour, and expect your lookup to return a Proc, set resolve=false as an option on the lookup. The Pluralization backend does this to return the i18n.plural.rule Proc before calling it itself.

My [unfounded] suspicion is that this feature was written for the Pluralization backend, but it never used it, and no one actually uses Proc resolving.

Global configuration options

ruby-i18n/i18n has a number of global configuration options you can set. The options themselves are implemented as class variables on the I18n class.

Aside: On the subject of class variables

Many (all?) of the global config options are implemented as class variables on the I18n class.

When I asked several Ruby experts about this, their sentiment was:

Some things are really puzzling in ruby-i18n…🤷

I’ve had to work around this design choice a few times.

`I18n.default_locale` (defaults to `en`)

I18n.default_locale is perhaps the easiest configuration to misunderstand. Indeed, the original design of I18n.default_locale was incorrect, which has not helped the situation.

Historical context

TODO

`I18n.fallbacks`

I18n.fallbacks is a lookup that takes a locale and returns an Array containing the locale, and all of the locale’s ancestors.

For example, if you were following the CLDR standard:

> I18n.fallbacks[:"zh-Hant-HK"]
[:"zh-Hant-HK", :"zh-Hant", :root]

Meaning “If you don’t find the data you’re looking for in zh-Hant-HK, fallback to zh-Hant and finally root.

I18n.fallbacks is used by the Fallbacks backend to determine the locales to fall back through.

The default I18n.fallbacks object doesn’t do anything though, and can be configured in strange and incorrect ways:

I18n.fallbacks = [nil] # A historical hack to work around the issue in https://github.com/ruby-i18n/i18n/pull/415. Can be removed.
I18n.fallbacks = [:fr] # `fr` is a fallback for all lookups; edge case bugs!
I18n.fallbacks = [I18n.default_locale] # `fr` is a fallback for all lookups; edge case bugs!

None of these are what you want. At Shopify, we implemented our own fallbacks object that complies with the CLDR spec, and use that:

> I18n.fallbacks = ShopifyI18n::Cldr::Fallbacks.new
> I18n.fallbacks[:"zh-Hant-HK"]
[:"zh-Hant-HK", :"zh-Hant", :root]

Aside: The default I18n.fallbacks object caches the results of lookups:

> I18n.fallbacks
{}
> I18n.fallbacks[:"zh-Hant-HK"]
[:"zh-Hant-HK", :"zh-Hant", :root]
> I18n.fallbacks
{ :"zh-Hant-HK" => [:"zh-Hant-HK", :"zh-Hant", :root] }

Important: The fallbacks for any locale should only be locales that a user will likely understand. Notably, I18n.fallbacks should not be used to implement a “default to en if we don’t have any data” behaviour. The code makes assumptions about the fallback locales that would result in en incorrectly being used. If you want such a behaviour, it should be implemented using I18n.exception_handler.

Historical context

Originally, I18n.fallbacks defaulted to [I18n.default_locale], however this was changed to [].

`I18n.raise_on_missing_translations`

Just set it to true.

TODO

`I18n.exception_handler`

TODO

`I18n.available_locales`

An Array of locales for which translations are available. Unless you explicitly set these through I18n.available_locales= the call will be delegated to the backend.

Simple#available_locales (the default backend) computes the locales from the translations loaded from I18n.load_path.

Important: svenfuchs/rails-i18n only loads its locale information for locales in I18n.available_locales. This means that plural rules are only available for those locales, which means that you cannot pluralize in locales outside of that list. If you are using the Fallbacks Make sure that I18n.available_locales includes all of the fallback locales too.

I18n.locale_available? uses available_locales_set

`I18n.enforce_available_locales`

Setting this to true will cause I18n::InvalidLocale to be raised whenever a translation is requested for a locale not in I18n.available_locales

I’ve never found it useful to have enabled, and always set it to false.

Lookup options

`scope`

From the docs:

Scope can be either a single key, a dot-separated key or an array of keys

or dot-separated keys. Keys and scopes can be combined freely. So these

examples will all look up the same short date format:

I18n.t ‘date.formats.short’

I18n.t ‘formats.short’, :scope => ‘date’

I18n.t ‘short’, :scope => ‘date.formats’

I18n.t ‘short’, :scope => %w(date formats)

`locale`

By default, lookups will be done with the locale from I18n.locale. If you need to override that, most calls accept a locale parameter you can use.

`fallback`

You can disable the use of locale fallbacks by setting this to false

`default`

A value to return if the translation isn’t found. If the default is a Symbol, then it will be resolved (unless resolve is set to false)

Important: This should not be used to implement a “default to en if we don’t have any data” behaviour. If you want such a behaviour, it should be implemented using I18n.exception_handler.

`deep_interpolation`

Whether or not to perform “deep interpolation”. Deep interpolation will use the provided values to interpolate all child values of the resolved value.

It’s not clear to me whether this would ever be a good idea to use. It would be a strange situation where you’d want multiple strings to shared the same interpolation values.

`resolve`

resolve is a boolean, defaulting to true, that determines whether Symbol/Proc resolving should happen.

`separator`

Instead of specifying the nesting of the keys using I18n.default_separator, one can override the separator used by passing in separator.

I18n.t("foo;bar;baz", separator: ";")

`cascade`

If you want to use the Cascade backend, you need to include it like any other backend and then pass the cascade: true option in each lookup.

I18n.t(:'foo.bar.baz', :cascade => true)

Internal options

There are also options that are reserved for internal use.

`fallback_in_progress`

Meant to prevent infinite recursion in Backend::Fallbacks

`fallback_original_locale`

When a key lookup results in a Symbol, resolving of that Symbol should start from the original locale, not whichever locale was last in the fallback chain. fallback_original_locale stores that original locale.

`format`

When I18n.l is used, the key is used to look up a “format” which is then stored in the format option. AFAICT, it’s unused after that.

`object`

When I18n.l is used, the object being formatted (i.e., Date, Time, DateTime) gets stored in object so that its fields can be interrogated as part of the formatting.

Other bits

`i18n` reserved key

ruby-i18n/i18n reserves the i18n key for its own use. For example, it expects i18n.plural.rule to be a Proc that the Pluralization backend can use for its purposes.

Features

Backends

Interpolation

Locale codes

Fallbacks

root locale

Ancestors

Pluralization

Lateral Inheritance

Symbol resolving

Proc resolving

Global configuration options

Aside: On the subject of class variables

I18n.default_locale (defaults to en)

Historical context

I18n.fallbacks

Historical context

I18n.raise_on_missing_translations

I18n.exception_handler

I18n.available_locales

I18n.enforce_available_locales

Lookup options

scope

Scope can be either a single key, a dot-separated key or an array of keys

or dot-separated keys. Keys and scopes can be combined freely. So these

examples will all look up the same short date format:

I18n.t ‘date.formats.short’

I18n.t ‘formats.short’, :scope => ‘date’

I18n.t ‘short’, :scope => ‘date.formats’

I18n.t ‘short’, :scope => %w(date formats)

locale

fallback

default

deep_interpolation

resolve

separator

cascade

Internal options

fallback_in_progress

fallback_original_locale

format

object

Other bits

i18n reserved key

`root` locale

`Symbol` resolving

`Proc` resolving

`I18n.default_locale` (defaults to `en`)

`I18n.fallbacks`

`I18n.raise_on_missing_translations`

`I18n.exception_handler`

`I18n.available_locales`

`I18n.enforce_available_locales`

`scope`

`locale`

`fallback`

`default`

`deep_interpolation`

`resolve`

`separator`

`cascade`

`fallback_in_progress`

`fallback_original_locale`

`format`

`object`

`i18n` reserved key