A translation memory (TM) is a database of previously translated text and is a key component of CAT tools. Text is split into smaller segments (usually sentences or titles) during segmentation. The original segment and its translation are then saved into the translation memory as a translation unit when that segment is confirmed by a user in an editor.
Segments that are automatically confirmed based on pre-translation settings are not saved as they are already in the selected TM but context is added to the existing TM.
If a TM is created and added to an existing project, confirmed segments from that project will be added to the new TM.
A TM can have several target languages but only one source language and can be used in multiple projects simultaneously.
Three main benefits to using a TM:
A TM allows for the reuse of translations. This speeds up the translation process and reduces costs.
A TM helps to ensure translation consistency. This is important when a client has more than one translator working on a project
A TM serves as a long-term backup when the project or job is no longer available. It is strongly recommended that linguists confirm each segment they have put any effort into before moving on to another segment to ensure the effort is not lost.
While the use of translation memories is highly recommended, there are some limitations:
While there are no specific limits to how many target languages a TM can have or how many translation units can be saved in a TM, very large TMs (millions of segments) slow down the performance of searches, pre-translation or analysis and can be difficult to maintain and edit.
File size is limited to 1 GB for exporting and importing. A project can have multiple TMs, so it is always better to have a few smaller, well managed TMs than one very large one.
A maximum of 10 translation memories can be assigned to any project/language pair per each workflow step.
Translation memory (TM) is essential for producing consistent translations and can dramatically reduce translation costs. If a TM is not set up correctly and maintained, inconsistent and poor quality translations are produced.
Follow these rules to improve TM quality:
-
Choose trusted providers
Have a group of trusted providers (linguists/vendors) who deliver high-quality output that is saved to the master TM. When working with a provider for the first time or with someone whose output quality varies, consider using a secondary working translation memory where they can commit segments and keep the master TM in read-only mode. Use the master TM in Read and Write mode in later workflow steps where review is performed.
Preventing content of questionable quality from being included in a TM is easier than removing it later.
Suggested TM configuration:
-
Add context information to source files
Context information allows linguists to better understand content they are translating and improves the quality of the translation. There are different options for providing context such as attaching assets as reference files to projects or adding them on the segment level. For file formats with context key and notes properties, information can be displayed on segment level in a CAT tool. Some editors can display animations and graphics from attached external links.
-
Lock segments with high-quality matches
Pre-translating content from translation memory and locking high-score matches (context matches) prevents unwanted changes in the TM. Excluding locked segments from analysis and quotes shared with a provider reduces translation volume and costs.
-
Perform quality assurance and spell checks before confirming to TM
Misspellings, missing tags, incorrect punctuation are easily overlooked. automated quality assurance (QA) checks help with this. Advanced QA checks are also able to verify if correct terminology has been used—ensuring translation consistency. Some tools enable segment-level QA which won’t allow the provider to confirm segments and save them into the TM if quality assurance errors have been found. In case segment-level QA is not available (and the check is performed at the end of the localization process), use the working TM approach.
-
Perform linguistic quality assurance (LQA) evaluation
LQA evaluation is used to measure and qualify the translations and errors produced. It evaluates translation quality and provides constructive feedback to the provider.
-
Update your TMS with any changes that happen outside of your translation management system.
If linguistic edits take place in the native format or in a content management system, they are not saved to the TM and will be overwritten by future submissions of the same content unless the TM is updated. In such a scenario, update the TM manually.
-
Close the feedback loop
Discuss the quality of delivered translations with the provider and allow them to see the changes made to their work. It is important to clarify expectations and review detected issues to avoid encountering them again in the future.
To create a translation memory, follow these steps:
-
Translation memories can be created from three places:
Click the plus
icon beside in the left hand navigation panel.
Click New from the page.
Click Create new from the table on a project page.
The
page opens. -
Provide a
.Translation memories can be used for multiple projects so the name does not need to be specific for a project.
-
Provide a
.The original language of your document.
Only one source language can be selected per translation memory.
-
Provide a
.The languages to be translated into.
There can be an unlimited number of target languages in a translation memory but a maximum of 10-15 languages is recommended. Less than 30 languages is still manageable, but more than 50 languages cause the TM to become slow and hard to work with.
Provide business information and a note if applicable.
-
Click Create.
If created from a project page, the new TM is added to the list on that page.
If created elsewhere, the new TM page opens.
Clicking on a TM from the
panel or a project page opens the translation memory page.Dependent on user rights:
The attributes of a translation memory can be edited.
A TM can be deleted. This moves the TM to the recycle bin where it will remain for 30 days.
Content can be searched for individual entries requiring editing.
Content can be imported from other TMs or other CAT tools.
Content of a TM can be exported for editing in another CAT tool and imported back.
Previously translated files can be aligned with their originals and imported to a new TM.
The
table presents all projects a specific TM is associated with.Once added, target languages cannot be removed from a TM as long as:
There is an entry in the TM (even if only in other target languages).
The TM is used in an existing project.
The TM is used in any Project Template.
The attributes in a translation memory allow users to group, filter, and sort them. Attributes can also be used to restrict or allow access to guests, restricted Project Managers or users with limited users.
Attributes do not apply to translation units stored in that TM.
To edit translation memory attributes, follow these steps:
In order to use a translation memory for analysis, pre-translation, or actual translation in an editor, the TM must be assigned to a project.
Multiple TMs can be assigned to one project and a single TM can be assigned to multiple projects. There can be up to 10 TMs assigned to each project per language and Workflow step.
To assign a translation memory to a project, follow these steps:
-
From a project page, click Select from the table.
The
window opens. If there is only one target language, this step is skipped. -
Select required TMs and workflow steps. Click Continue.
The
page opens and available TMs can be filtered.
Searching by ID number in the general field will not return results based on ID number. To search for TMs by ID number, use the
search field.Strict locales can be applied to filtered source and target languages
-
Click
to select TMs.
Selected TMs are added to the
table and can be removed by clicking.
-
Set options for selected TM(s):
-
Any segments confirmed in an editor or uploaded are saved into the TM.
Not required and a maximum of two
TMs per language and workflow step in a project. -
Set the penalty percentage for TM matches in analysis, pre-translate and an editor.
-
Switch ON to manually set the order.
to
-
-
Click Save all.
Project page opens with assigned TMs listed in the
table.
Equal TM matches are prioritized based on the order shown in the Translation memories section of the project page.
Relevant TMs are displayed first and order is based on the priority set in the
table.When working with 101% matches from a TM, the previous and following segments provide context that can be saved with each segment.
Context is used to determine if the match in TM is:
-
101%
An in-context match.
-
100%
Source text is a match, but context of the new text is different.
This becomes important when the context of the segment results in two different translations of the same original text.
Example:
In Czech, a female 'Project manager' is translated differently than a male 'Project manager'.
If surrounding segments create context that can be used to identify the difference, both translations are saved to the translation memory and are presented as a 101% match when the same context is provided.
Context types
The type of context which will be saved with the segment to the translation memory is set in job is imported. Every file can be imported with different settings.
when theA translation memory can contain segments with different types of context:
-
Automatic
Context type will be selected automatically based on the file type.
Files imported with the context type Segment Key: ANDROID_STRING, CHROME_JSON, DESKTOP_ENTRY, .DTD, JAVA PROPERTIES, JOOMLA_INI, .JSON, MAC_STRINGS, MOZILLA_PROPERTIES, .PHP, .PLIST, .PO (gettext), .RESJSON, .RESX, .TS, .XML_PROPERTIES, .YAML
Other formats will be imported with the context type Previous and next segment.
-
Previous and next segments
Both the previous and next segment will be saved as context.
-
Segment key
The segment key or the segment ID will be saved as context. This can be specified for the above mentioned Segment key file formats and also customized for: .CSV, .XML, Multilingual XML and Multilingual MS Excel files.
In some file formats, the segment key is more important than context (.YAML, .JSON, etc.).
-
No context
If context can be ignored no context will be saved and the translation will always be overwritten by the most recently modified version.
No context is also applied when the provided context is not found.
Example:
The translation memory match can be further optimized for the imported jobs using options in the settings. These options can also be set up in the main Project Settings:
-
If context matches in either the previous or the next segment, it will be offered as a 101% match. Default requires both the previous and next segment to match.
-
(enabled by default)
If the tag's metadata in the job is different then tag metadata in the TM, the difference will be ignored. The tag metadata from the job's original segment will be automatically added to the job's translated segment.
Example:
-
When more than one 101% with a different target (translation) is found, then all 101% matches are displayed as 100% with an arrow signaling the penalization.
-
If the Pre-translation threshold is set to 101%, segments with multiple matches will not be pre-translated.
-
In Analysis:
Segments with multiple 101% matches will be counted as 100% matches.
-
Translation memory matches can be prioritized based on the following project metadata fields stored for each segment within the TM:
Client.
Domain.
Subdomain.
Filename.
In case of multiple TM entries with the same source text, the TM entry with metadata that matches project metadata is positioned higher in the CAT pane and used for pre-translation regardless of the TM priority.
Segments with a higher match score are always prioritized: a 100% match with no metadata prevails over a 99% match that fully matches project metadata.
Due to continuous improvements, the user interface may not be exactly the same as presented in this video.
Set up project metadata for TM matches
TM matches prioritization based on project metadata can be set up for the imported jobs using options in the File Import settings. The same options are also available in the main project settings.
To enable TM matches prioritization, follow these steps:
Select
in to display the relevant page.-
Click on Select metadata dropdown menu under the .
List of available project metadata is displayed.
-
Click on the desired metadata from the list.
Selected metadata are added at the top with a progressive number indicating their order of importance.
Optionally, drag and drop the selected metadata in the field to adjust their order of importance when prioritizing TM matches.
Click Save to apply the settings.
Example of TM matches prioritization
Priority chosen for project metadata is:
Filename.
Client.
Domain.
Subdomain.
The TM entry with the Filename field matching a project's filename is prioritized. If the TM entries have the same filename fields or their filename fields do not match the project's filename, prioritization is based on Client metadata.
TMs can be modified directly in the UI itself. Performing large-scale editing and modifications to the TM can only be done outside the UI. Appending the segment ID with update in the .XLSX file triggers an update on import.
To batch update a translation memory in a spreadsheet editor, follow these steps:
Export a TM to XLSX and ensure it is formatted correctly for import.
Open the file in an editor.
Insert two additional columns between the
column and the first language column.Keeping the ID information in column A, remove the
column label and place it in column C.Fill the cells in column B with the word update.
-
In the first cell of the
column, create the formula=(A2&"|"&B2)
and click Enter.Cell C2 is populated with the ID from cell A2 and the word update (from cell B2) separated by
|
. -
Copy the formula to the rest of the
column.All
column cells are populated with the appended ID information. Make required modifications to the segments.
Save the file.
-
Import the file back.
All segments with the appended ID are updated.
Appending the ID with the word delete instead of update will delete those segments on import.
A translation unit (TU) is a source segment and the target segments for all languages grouped together and saved into a translation memory. It's not possible to save multiple target versions in the same language, there can be only one target segment per selected language.
When a segment/TU is saved, pre-defined attributes or metadata are automatically added to the source and/or target of the TU in the translation memory.
These attributes can be saved with the target segment or source segment as indicated:
Created (date/time) - target segment
-
Created by - target segment.
Only Phrase usernames are supported.
When a non-empty segment is first confirmed by a user other than the segment's original author, the Created by field will be populated with the original author's username, and the user confirming the segment will be added to the Last modified by field.
-
Last modified (date/time) - target segment
Only the latest version of the target segment is saved to TM. This attribute marks the time when the latest version was saved.
-
Last modified by - target segment.
Only Phrase Usernames are supported.
Project - target segment
-
Client - target segment.
This is the Client as set in the project when the translation was created.
-
Domain - target segment
This is the Domain as set in the project when the translation was created.
-
Subdomain - target segment
This is the Subdomain as set in the project when the translation was created.
-
File - target segments
Name of the file where the translation was created.
-
Context - source segment
This is the Previous segment, Next segment or Segment key, depending on the when the job was created.
Translation Memory attributes (Domain, Client, Business Unit, etc.) have no effect on translation unit metadata.
For importing and exporting the content of a translation memory using the .TMX format, the following metadata is supported:
-
Properties in the source TUV element:
<prop type="context_prev">Text of the previous segment </prop> <prop type="context_next">Text of the following segment </prop><prop type="x-context_seg_key">Context Key</prop>*
*For context based on a segment key
-
Properties in target TUV element:
<prop type="created_at">1322746823589</prop> <prop type="created_by">Some name</prop> <prop type="modified_at">1323854662890</prop> <prop type="modified_by">Some name</prop> <prop type="project">Project name</prop> <prop type="client">59131</prop> <prop type="domain">6678</prop><prop type="subdomain">5370</prop> <prop type="filename">File name</prop><prop type="aligned">false</prop><prop type="reviewed">false</prop>
For importing and exporting the content of a translation memory using the .XLSX format, the following metadata is supported:
ID - Phrase internal ID
{source language code} - for example 'en' or 'en_us'
prev - text of the previous segment
next - text of the following segment
seg_key - text of the context key
mdata - metadata of Phrase tags
{target language code} - en or en_us
created_by - Phrase Username
created_at - in format 2017.07.07 14:39:52,000
modified_by - Phrase Username
modified_at - in format 2017.07.07 14:39:52,000
client - Phrase ID (number)
project - Phrase ID (number)
domain - Phrase ID (number)
subdomain - Phrase ID (number)
note - text (external use only, not visible in Phrase)
reviewed - true/false (external use only, not visible in Phrase)
aligned - true/false (external use only, not visible in Phrase)
filename - the name of the original file (test.docx)
mdata - metadata of Phrase tags
The order of the columns reflects the attributes of the source segment and the attributes of the target segment.