As of 2022-08-30, ruby-i18n/i18n
is the ninth most popular Ruby gem.
Yet, its documentation, performance, and code quality leave a lot to be desired. While I contribute bug reports and fixes whenever I can, I found I needed a place to jot down the issues, history, and surprises I’ve discovered while working with this library.
This document will serve as a place for me to collect my notes about the library. It may or may not end up being valuable to others.
Features
Backends
Backends are bits of composable functionality that are meant to be combined together using include
on the Simple
backend:
I just use the Fallbacks
and Pluralization
backends:
I18n::Backend::Simple.include(
I18n::Backend::Fallbacks,
I18n::Backend::Pluralization,
)
Interpolation
i18n
supports two different interpolation syntaxes:
- Modern:
%{}
syntax (e.g.greeting: Hello %{name}!
) - Ancient, Ruby 1.9
strformat
syntax (e.g.,greeting: Hello %<name>.d
) # TODO, make the examples match; link to strformat
Always use the former.
Locale codes
Locale codes can be pretty much anything, but the most common form is an ISO 639 language code (e.g., en
for “English”) optionally followed by an ISO 3166 region code (e.g., en-CA
for “English, as written in Canada”).
But there are other forms that are also very common.
For example, it’s common to have locale codes contain a script:
zh-Hant-TW
(“Chinese, written in the Traditional Chinese script, as written in Taiwan”)uz-Cyrl
(“Uzbek, written in Cyrilic script”)uz-Arab-AF
(“Uzbek, written in Arabic script, as written in Afghanistan”)
ISO 15924 defines the valid script codes, and CLDR defines the default script as Latn
(i.e., Latin script).
IETF RFC 5646 / BCP 47 defines “Language tags” as having up to 6 segments:
langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]
In practice, you’ll see all sorts of fun locale codes requested by users, both valid and invalid (e.g., en-PIRATE
) and the library will need to determine which locale is best locale to begin your search for translation data from. CLDR defines an algorithm and data to help with this determination.
Fallbacks
There are thousands of locales that a global company will see users requesting, and they can generally be thought of forming a hierarchy starting at the most specific to the most general:
ca-ES-VALENCIA
(“Catalan, as written in Valencia, in Spain” aka. Valencian)ca-ES
(“Catalan, as written in Spain”)ca
(“Catalan”)
You won’t have perfect localization data for every locale at every level of granularity. In order to gracefully degrade the UI, you want to provide something “good enough” for the user when you don’t have data matching their specific locale. This is done by defining locale fallbacks.
Even if you had perfect and complete data for every locale, there is so much overlap in the data between locales and their ancestors, that you wouldn’t want to keep n
copies of the data in memory. By pushing all the duplicated data into the most general ancestor possible, you can save a lot of memory and disk space, at the cost of increased complexity and CPU use in the lookup code.
root
locale
CLDR defines a root
locale that is the ancestor of all locales (and thus contains data that is common to many locales).
Ancestors
Note that in many cases, finding the ancestor of a locale can be done naively done by chopping off the last segment of the locale (i.e., everything after the last -
) However, there are many examples where this is not the case, and “hyphen chopping” gives the wrong result.
The fallback chain for en-CH
is:
en-CH
(“English, as written in Switzerland”)en-150
(“English, as written in Europe”. Note that150
is defined by UN M.49, not ISO 3166)en
(“English”)root
The fallback chain for zh-Hant-TW
is:
zh-Hant-TW
(“Chinese, written in the Traditional Chinese script, as written in Taiwan”)zh-Hant
(“Chinese, written in the Traditional Chinese script”)root
Note that we don’t fall back to zh
here, since zh
is written in Simplified Chinese characters, while zh-Hant
is written using Tradition Chinese characters. So it wouldn’t make sense to fall back to zh
in this case because you’d be switching scripts if you did.
The Fallbacks
backend simply falls back through each ancestor of the locale until the lookup succeeds.
Which locales to fall back through for any given locale is defined by I18n.fallbacks
.
CLDR provides data that can be used by I18n.fallbacks
to determine the fallback chain.
Pluralization
The way that words are pluralized varies dramatically across locales.
There are currently six different pluralization keys defined in CLDR: few
, many
, one
, other
, two
, and zero
.
Each language uses a different subset of the keys for their grammars, and each language will use any given key following different rules from one another. All languages use at least the other
key (e.g., if the language does not have a concept of pluralization (e.g. zh
, ja
, etc.), then it will use the other
key for all counts)
CLDR provides data describing the keys used by the locale, and the rules to decide which key to use for any given count.
Common mistake: the zero
, one
, and two
keys do not align with counts of 0
, 1
, and 2
, but rather any number that behaves like those numbers grammatically in the language. For example, some locales use one
for numbers that end in “1” (eg 1, 21, 151) but that don’t end in 11 (like 11, 111, 10311).
In ruby-i18n/i18n
, pluralization is handled by the Pluralization
backend. It expects to find a Proc
describing the pluralization rules of the locale at i18n.plural.rule
. It then calls the Proc
with the count
to get the pluralization key to use in the lookup.
Lateral Inheritance
Not all pluralization contexts will have all of the keys needed for the rules of the locale. They could be missing, or excluded to reduce redundancy.
If a key is missing, the lookup should attempt to use the other
key. If the other
key is also missing, only then does the lookup fall back to an ancestor locale.
This is part of what CLDR calls “Lateral Inheritance”. It’s a little more complicated than I’ve described (dealing with genders and cases, as well as pluralization).
Symbol
resolving
When the value found in a lookup is a Symbol
, ruby-i18n/i18n
will perform another lookup with that Symbol
as the key, and return that instead.
This is how they accommodate CLDR’s concept of aliases.
If you don’t want this behaviour, and expect your lookup to return a Symbol
, set resolve=false
as an option on the lookup.
Proc
resolving
When the value found in a lookup is a Proc
, it will call the Proc
with the same options given to the lookup.
If you don’t want this behaviour, and expect your lookup to return a Proc
, set resolve=false
as an option on the lookup.
The Pluralization
backend does this to return the i18n.plural.rule
Proc
before calling it itself.
My [unfounded] suspicion is that this feature was written for the Pluralization
backend, but it never used it, and no one actually uses Proc
resolving.
Global configuration options
ruby-i18n/i18n
has a number of global configuration options you can set. The options themselves are implemented as class variables on the I18n
class.
Aside: On the subject of class variables
Many (all?) of the global config options are implemented as class variables on the I18n
class.
When I asked several Ruby experts about this, their sentiment was:
Some things are really puzzling in
ruby-i18n
…🤷
I’ve had to work around this design choice a few times.
I18n.default_locale
(defaults to en
)
I18n.default_locale
is perhaps the easiest configuration to misunderstand. Indeed, the original design of I18n.default_locale
was incorrect, which has not helped the situation.
Historical context
TODO
I18n.fallbacks
I18n.fallbacks
is a lookup that takes a locale and returns an Array
containing the locale, and all of the locale’s ancestors.
For example, if you were following the CLDR standard:
> I18n.fallbacks[:"zh-Hant-HK"]
[:"zh-Hant-HK", :"zh-Hant", :root]
Meaning “If you don’t find the data you’re looking for in zh-Hant-HK
, fallback to zh-Hant
and finally root
.
I18n.fallbacks
is used by the Fallbacks
backend to determine the locales to fall back through.
The default I18n.fallbacks
object doesn’t do anything though, and can be configured in strange and incorrect ways:
I18n.fallbacks = [nil] # A historical hack to work around the issue in https://github.com/ruby-i18n/i18n/pull/415. Can be removed.
I18n.fallbacks = [:fr] # `fr` is a fallback for all lookups; edge case bugs!
I18n.fallbacks = [I18n.default_locale] # `fr` is a fallback for all lookups; edge case bugs!
None of these are what you want. At Shopify, we implemented our own fallbacks object that complies with the CLDR spec, and use that:
> I18n.fallbacks = ShopifyI18n::Cldr::Fallbacks.new
> I18n.fallbacks[:"zh-Hant-HK"]
[:"zh-Hant-HK", :"zh-Hant", :root]
Aside: The default I18n.fallbacks
object caches the results of lookups:
> I18n.fallbacks
{}
> I18n.fallbacks[:"zh-Hant-HK"]
[:"zh-Hant-HK", :"zh-Hant", :root]
> I18n.fallbacks
{ :"zh-Hant-HK" => [:"zh-Hant-HK", :"zh-Hant", :root] }
Important: The fallbacks for any locale should only be locales that a user will likely understand. Notably, I18n.fallbacks
should not be used to implement a “default to en
if we don’t have any data” behaviour. The code makes assumptions about the fallback locales that would result in en
incorrectly being used. If you want such a behaviour, it should be implemented using I18n.exception_handler
.
Historical context
Originally, I18n.fallbacks
defaulted to [I18n.default_locale]
, however this was changed to []
.
I18n.raise_on_missing_translations
Just set it to true
.
TODO
I18n.exception_handler
TODO
I18n.available_locales
An Array
of locales for which translations are available.
Unless you explicitly set these through I18n.available_locales=
the call will be delegated to the backend.
Simple#available_locales
(the default backend) computes the locales from the translations loaded from I18n.load_path
.
Important: svenfuchs/rails-i18n
only loads its locale information for locales in I18n.available_locales
. This means that plural rules are only available for those locales, which means that you cannot pluralize in locales outside of that list. If you are using the Fallbacks
Make sure that I18n.available_locales
includes all of the fallback locales too.
I18n.locale_available?
uses available_locales_set
I18n.enforce_available_locales
Setting this to true
will cause I18n::InvalidLocale
to be raised whenever a translation is requested for a locale not in I18n.available_locales
I’ve never found it useful to have enabled, and always set it to false
.
Lookup options
scope
From the docs:
Scope can be either a single key, a dot-separated key or an array of keys
or dot-separated keys. Keys and scopes can be combined freely. So these
examples will all look up the same short date format:
I18n.t ‘date.formats.short’
I18n.t ‘formats.short’, :scope => ‘date’
I18n.t ‘short’, :scope => ‘date.formats’
I18n.t ‘short’, :scope => %w(date formats)
locale
By default, lookups will be done with the locale from I18n.locale
. If you need to override that, most calls accept a locale
parameter you can use.
fallback
You can disable the use of locale fallbacks by setting this to false
default
A value to return if the translation isn’t found. If the default is a Symbol
, then it will be resolved (unless resolve
is set to false
)
Important: This should not be used to implement a “default to en
if we don’t have any data” behaviour. If you want such a behaviour, it should be implemented using I18n.exception_handler
.
deep_interpolation
Whether or not to perform “deep interpolation”. Deep interpolation will use the provided values to interpolate all child values of the resolved value.
It’s not clear to me whether this would ever be a good idea to use. It would be a strange situation where you’d want multiple strings to shared the same interpolation values.
resolve
resolve
is a boolean, defaulting to true
, that determines whether Symbol
/Proc
resolving should happen.
separator
Instead of specifying the nesting of the keys using I18n.default_separator
, one can override the separator used by passing in separator
.
I18n.t("foo;bar;baz", separator: ";")
cascade
If you want to use the Cascade
backend, you need to include it like any other backend and then pass the cascade: true
option in each lookup.
I18n.t(:'foo.bar.baz', :cascade => true)
Internal options
There are also options
that are reserved for internal use.
fallback_in_progress
Meant to prevent infinite recursion in Backend::Fallbacks
fallback_original_locale
When a key lookup results in a Symbol
, resolving of that Symbol
should start from the original locale, not whichever locale was last in the fallback chain.
fallback_original_locale
stores that original locale.
format
When I18n.l
is used, the key is used to look up a “format” which is then stored in the format
option.
AFAICT, it’s unused after that.
object
When I18n.l
is used, the object being formatted (i.e., Date
, Time
, DateTime
) gets stored in object
so that its fields can be interrogated as part of the formatting.
Other bits
i18n
reserved key
ruby-i18n/i18n
reserves the i18n
key for its own use. For example, it expects i18n.plural.rule
to be a Proc
that the Pluralization
backend can use for its purposes.