27
Internationalization(i18n) and Localization(L10n)
- Localization Process
- Develop a localization strategy
- Region and Language
- Add new region/language/service
- Incremental Localization
- Management of translation
- Localized implementation
- Localized multilingual implementation
- The Challenges of Localization
- Do you need to consider SEO
- Localization of product design
- Localization under Microservices
- Localized technical or business standards development
- Development Environment and Business Processes
- Static text processing
- Whether to store language and region settings
- Localization of back-end services
- Localization of third-party services and resources
- Release Process
- Localization under micro front-end architecture
- Localized testing
- Localization Platform
A successful product needs to go global through many stages, from the perspective of software development there are two main processes: internationalization and localization.

A language environment is the use of a specific language or language variant within a country or geographic region, which determines the format and parsing of dates, times, numbers and currencies, as well as the various measurement units and translated names of time zones, languages, countries and regions. Internationalization enables a piece of software to handle multiple language environments, localization enables a piece of software to support a specific regional language environment. This means that the process of globalization is to first make the software internationalized, and then to do the localization implementation so that it can support a specific language environment in a specific region.
They are often abbreviated as i18n (18 means that there are 18 letters between i and n in the word "internationalization") and L10n, respectively, due to the length of their single words, using a capital L to distinguish the i in i18n and to make it easy to distinguish the lowercase l from the 1.(Wikipedia)
The Unicode character set can display almost every character known to man in code points ranging from 0 to 10FFFF (hexadecimal). It requires at least 21 bits for storage. The text encoding system UTF-8 adapts Unicode code points to a reasonable 8-bit data stream and is compatible with the ASCII data processing system.UTF stands for Unicode Transformation Format.
Since 2009, UTF-8 has been the dominant encoding form on the World Wide Web. As of November 2019, UTF-8 is used in 94.3% of all web pages (some of which are ASCII only, as it is a subset of UTF-8), and 96% of the top 1000 pages. Therefore, UTF-8 encoding is recommended for internationalization.
This article Internationalization of IT products is never enough to "support English" mentions that some GBK-encoded texts have many The text that "looks the same" is actually slightly different. However, in order to save space in Unicode, the same Code Point is assigned to them.

How can we distinguish these identical characters with the same code (displaying a character in a different glyph, i.e. the same character)? This requires the help of locale.
When calculating the number of Chinese characters, it is usually done by character form, i.e., simplified, traditional, variant, new, old, etc., of a character representing the same phonetic meaning. This way of counting is in fact counting variants. Therefore, the number of glyphs included in large dictionaries has long been wrongly regarded as the size of the Chinese character system.(Wikipedia)
A locale is the language environment of the software at runtime, which includes Language, Territory and Codeset. A locale is written in the following format: Language[_Territory[. UTF8. In Linux, a locale consists of the following parts.
If your locale is en_US.UTF8, you must change it to zh_CN.UTF8 to display Chinese correctly. All supported locales are stored in the
/usr/share/locale
directory of the macOS operating system.
The same language may have some subtle differences in different countries and regions, for example, there are some differences between American English and British English. The same country may also have multiple languages, for example, China has simplified and traditional languages. In the introduction to locale above we saw the use of
language_region
to express the exact language of a country.For countries and languages ISO has developed corresponding standard codes ISO 3166-1 and ISO 639-1.
The browser uses the language code to send the name of the language accepted by the browser in the
Accept-Language
HTTP header. For example: it, de-at, es, pt-br.GNU gettext is the GNU Internationalization and Localization (i18n) library, which is often used to write multilingualization (M17N) programs. Many programming languages such as C, C++, Python, PHP, Rust, Elixir, etc. support the use of gettext from within the language.
The following is the flow of how Java calls gettext to complete internationalization.

ResourceBundle
class.The following diagram shows the flow of internationalization in PHP using gettext.

Elixir implements i18n's directory structure using gettext.
priv/gettext
└─ en_US
| └─ LC_MESSAGES
| ├─ default.po
| └─ errors.po
└─ it
└─ LC_MESSAGES
├─ default.po
└─ errors.po
The process of using gettext is a typical process of making an application support i18n internationalization.

A typical localization flow chart is shown in the figure above. Among the parties involved are.
This piece begins with these basic antecedent considerations.
- Whether localization is done by code Open PR
- How each service development team does incremental localization
- Synchronization of knowledge on localization among teams
- Use of industry standard libraries (e.g. Unicode Common Locale Data Repository CLDR) for language-specific formats for dates, times, time zones, numbers, and currencies
- The locale identifier is in
language_region
format, e.g. en_US for United States English language.

The challenges of localization are mainly issues arising from differences in language, culture, writing habits and laws in different geographic areas, in the following categories.
- Currency: Currency formatting must take into account the currency symbol, the position of the currency symbol and the position of the minus sign. Most currencies use the same decimal separator and thousands separator as the numbers in the regional or area setting. However, in some places this is not the case, for example in Switzerland, the decimal separator for the Swiss franc is a period.
- Date and time: The internationalization of date/time involves not only the geographical location (e.g. localized representation of calendar such as day of the week, month, etc.), but also the time zone (TimeZone, for UTC/GMT offsets). Time zones are not only geographically defined, but also politically defined. For example, China geographically spans 5 time zones, but only uses one unified time zone. Many other countries have "daylight saving time" and the difference between Berlin time and Beijing time is subject to change. Sometimes it is 7 hours (winter time), sometimes it is 6 hours (daylight saving time).
- Numbers: There are also differences in the way numbers are represented in different countries and regions. Factors that affect the representation of numbers include the representation of numeric characters, the representation of numeric symbols, the type of numbers, etc.
- Weight/length/physical units: Because of the differences in units, multiple geographical versions of the same set of data need to be converted.
- Business-related units of measurement: For example, different countries have different billing rules for their products. This requires business staff support to find out the corresponding position and give conversion rules.
If you are localizing a website for toC, you need to consider some things related to search engine optimization (SEO), such as this How to approach an international strategy mentions some key points.
www.mysite.com/de/
would tell the user that the page is in German.<link rel="alternate" href="http://example.com" hreflang="en-us" />
.Using a more localized design for the same content in different geographies can lead to better results, as mentioned in the article Internationalization and Localization of Product Design about the different presentation of Spotify's song covers in different countries.

The localization process for a single application is relatively simple from an architectural point of view. However, many applications nowadays are microservice architectures with multiple teams collaborating on the development model. If individual teams are responsible for the localization of their respective services, there must be a unified localization committee to develop technical standards for localization.
Or there is a dedicated localization team to implement localization, and this team will be responsible for solving the previous problems. The project I am involved in falls into the latter category. Our team completed the localization of nearly a dozen microservice subsystems for the entire large system, and these dozens of systems were handled by several large groups of multiple teams, so the collaborative process of such cross-functional requirements (CFR) across multiple teams is a complex task.
Prior to the implementation of localization, it is important to identify the relevant technical or operational standards, some of which are.
- The possibility of storing language-related text in front-end or back-end static text extraction to files named by language identifiers, e.g. en.json for static text in English, and en_US.json for text related to US English (e.g. units of measure, dates, numbers, currency, etc.).
- The
language_region
format is uniformly used in remote service calls (front-end calls to back-end or back-end calls to other internal or external services), e.g. en_US stands for getting the localized version of the English language for the US region.
In fact, the most time-consuming part of localization for our team was the start-up of the local environment. With so many services involved and slight differences in the way different services are launched, and even wrong guidance documents, we needed to keep stepping on the toes to finish setting up the environment. In the end, our way of dealing with this was to contact the development teams, and each time we did the pre-localization of a service, we would ask the development team to help us set up the local environment.
Another difficulty was our lack of understanding of the business. Since each service has a large number of components and pages, including dynamic data from different sources of back-end services, it was hard to figure out just by ourselves. In the end, when we did the pre-localization of this service, we would get business analysts from the development team to help us introduce the business processes involved in this service.
Some internationalized sites have language or region switching designed as hyperlinks that allow users to access different language and region versions of the site, which do not require storing language or region configurations.
Sites with user profile configuration generally offer to set the preferred language and region in the profile settings, so that users can synchronize the last set language or region when switching devices.
If your site users switch devices infrequently, a simple process can store these configurations in the browser store. When the user switches devices, the default settings are automatically restored. The advantage of such a design is that it is simple, and it is easier to overtake to other solutions later. The specific design chosen needs to be combined with the specific business to choose.
Localization of back-end services involves the following four components.
locale = en_US
HTTP header can be used to request pages in English for the United States.If the technology stack of the back-end service is different, the localization team also needs to summarize the internationalization process for the different technology stacks of the back-end service and synchronize it with other development teams within the organization.
There are cases of calling external services in the backend service remote call. If you call external services, you need to confirm whether the external services support multi-language version first, and if they do, you can integrate them according to the docking documentation. If not, you need to contact the external service provider to determine the support plan.
Since the implementation of localization involves the transformation of more than a dozen subservices, localization can be controlled by Feature Toggles to be turned on or off in different environments. The tests affected by localization (unit tests, integration tests and UI tests) also need to be controlled via Feature Toggles so that the test suite of the original service is minimally affected.
Once all services have been localized and implemented, the localized Feature Toggles for all services can be opened to bring the final version online.
There are two designs to choose from regarding localized Feature Toggles.

As the above figure shows a micro front-end architecture website, the whole website interface is composed of five service pages of A/B/C/D/E. The language switch button is on service A. When the user switches from English to Chinese, the other services B/C/D/E need to switch their respective interfaces to the Chinese language version.
One way is to have the internationalization (i18n) library instance initialized by Service A and mounted on the browser window object when the browser loads the page, and Services B/C/D/E use the internationalization library instance object initialized by Service A. When switching languages, the internationalization instance object of Service A switches the language of all services.
The locale files for each service can be loaded into the browser uniformly by Service A. The advantage of this approach is that we know when the last language file is loaded, which means that the localization of all services on the whole page is initialized and the user can switch languages normally.
Localization testing verifies that the application or website content meets the language, cultural and geographical requirements of a particular country or region.

For more details, see this article Localization testing: why and how to do it.
A very important piece of localization is the selection of a suitable translation management platform (TMS), which generally has the following function points.
Major localization platforms.
This concludes some introductions to internationalization and the basic process of localization. Localization is a complex task, and the biggest difficulty is not knowing enough about the target language and culture. But after you've read this article, I hope it will give you more confidence to do localization-related work.
27