Localization files are text files that can be opened and edited in a text editor such as Notepad or TextEdit or one of the myriad enhanced text editing tools used by programmers. These files generally follow the key-value principle. This means that they contain a list of text snippets (strings) that are associated with unique IDs (keys). Each string is thus a value of a key (This simple example is the format of localization files used in Java programming.):
-
key1 = value1
-
key2 = value2
-
...
-
keyN = valueN
Creation of localization files
Localization files are plain-text files with a simple structure. They can be manually created but are usually automatically generated by internationalization utilities or scripts that are available for different development environments. The automatic creation of localization files ensures that file structures are valid.
To create a localization file, all pieces of displayable text are replaced with unique IDs in the code files. The text strings are then added to the localization file with their IDs.
Use of localization files
Instead of the actual text strings, the code now contains only keys. When the software generates a view for the user, these keys are used to look up the associated strings in the localization file.
If an application is set up to be used in English and Spanish, all English text may be kept in a file called English.txt
and is the default text location. If a user does not select a language, all text will be pulled from this file to generate any display. If the user selects Spanish, the software is redirected to Spanish.txt
. Many languages can be used with a system like this.
The advantage is that the choice of language for the display does not affect the code. If the software needs to display a login button, it may require the string associated with the key login_button
and only needs to know in which file to look to retrieve the appropriate string for the given language.
String management
As a key-based translation platform, Phrase supports many different resource file types. After files are uploaded, the keys and their associated string values are extracted. The keys and strings are then presented to the translator in a standardized format. Translators focus on their task without having to worry about the exact format of the localization file. They can inspect the keys, because the key itself can provide crucial context and guide them to correct word choices.
When all strings are translated, files are downloaded. In the process, the needed localization file format are created that match the original source file.
Resource file formats
Four broad types of resources are supported and are all essentially text based and can be opened and inspected in a text editor.
Spreadsheets
.XLSX and .CSV files are supported. These formats are equivalent for localization purposes and contain rows of key-value pairs. The keys are in one row, while the corresponding values are in an adjacent row. Which exact column is used for which purpose depends on the application, and a localizer needs to configure Phrase to interpret the columns correctly. ZenDesk .CSV files have a fixed structure, so this file type does not require further adjustments:
"Title","Default language","Default text","English text","Variant status" "simple_key","German","Einfacher Schlüssel.","Simple key.","Current"
XML
XML is a format that offers meta information in the form of <tags>
. The tag structure is used to determine where the keys and their corresponding values are, as shown here from an Android XML file:
<string name="simple_key">Just a key with a message.</string>
Two standard XML translation formats are .TMX and .XLIFF. These do not only hold keys and values in one language but also associate value pairs from a source language with corresponding values from a target language. Such files are typically bilingual, as this translation unit in a Symfony Xliff file shows:
<trans-unit id="simple_key" resname="simple_key"> <source xml:lang="de-DE">Nur ein einfacher Schlüssel mit einer einfachen Nachricht.</source <target xml:lang="en-GB">Just a simple key with a simple message.</target> </trans-unit>
QT programs use resource files with a structure that is very similar to these standardized formats, but for historical reasons have a different layout.
Plain key-value lists
There are resource files that contain just simple listings of keys and values, as this snippet from a Ruby on Rails YAML shows:
simple_key: Just a simple key with a simple message.
Many different programming languages or platforms use such formats with minor layout differences.
Since these are monolingual files, a localization program needs to maintain parallel versions of such files - one for the source language and others for the target languages.
Gettext produces key-value files containing additional information, such as descriptive comments or plural variants:
# This is the amazing description for this key! msgid "key_with_description" msgid_plural "" msgstr[0] "Check it out! This key has a description! (At least in some formats)" msgstr[1] "Check it out! This key has %s descriptions! (At least in some formats)"
There are competing formats with similar functionality and layouts that vary in relatively minor ways.
Associative arrays
While other formats require customized code (parsers) to read them, some formats are easier for developers and localizers. Formats based on .JSON (JavaScript) and .PHP arrays can be read and map directly into common code structures (arrays) that are easy to manipulate. Arrays can be complex and different applications generate custom array structures.
For example, go-i18n JSON refers to keys as id
:
{ "id": "simple_key", "translation": "simple key, simple message, so simple." },
Angular uses the keys themselves as keys in its arrays:
"simple_key": "I am a simple key with a simple message.".
Since there are these minor but crucial differences, widely-used .JSON and .PHP Array structures are supported.