External terminology (or glossary) files can be imported in Excel (.XLSX) or .TBX file formats. The size limit for a file that can be uploaded is 1GB. An exported term base contains all languages of the given term base.
If a term base requires bulk changes such as mass deletions or the updating of many terms, they can be exported and bulk changed in external spreadsheet applications.
To import content, follow these steps:
-
From a term base page, click Import.
The Import TBX/XLSX window opens.
-
Choose a file to import:
-
An XML based file format for spreadsheet applications (Excel).
The file must be prepared in a specific manner for import.
.XLSX is the easiest way to import terms into a term base. A plain list of terms can be imported, but more complex terminology imports are also supported (import of synonyms, morphology, terms with various attributes, etc.).
-
TBX
An exchange format for use in other CAT tools. Can also be used for editing content in external tools such as Okapi Olifant.
-
-
Select options:
-
-
-
Prevents the import of a language if it has a different locale than the project.
Example:
A file with an EN_US designation will not be imported into a TM designated with just EN and not EN_US.
-
Every term in a term base has a list of attributes that can be exported to, or imported from .TBX or .XLSX files. Some of these attributes can be edited directly in the term setting or edited externally in the .XLSX or .TBX file.
The attributes of a term base (Client, Domain, etc.) have no effect on the individual term attributes.
.XLSX files
Term metadata such as forbidden, preferred, case or exact are stored as boolean (TRUE/FALSE) values in Excel. Based on your Windows locale settings, you may find these values in your language in the .XLSX export (e. g. WAHR/FALSCH for German). When editing a term base in Excel, follow the pattern and type these values in the required language so that Excel can recognize them and maintain file integrity. For other columns, use English values as stated below.
-
CID
Phrase Concept ID. The term concept includes the source and all its targets and synonyms.
-
concept_domain
-
concept_subdomain
-
concept_url
-
concept_definition
-
concept_note
-
TID
Phrase term ID. The ID of the specific term in the specific language.
-
{Language code}
A term's language code based on supported languages.
-
status
Either New or Approved.
-
forbidden
True or False.
-
preferred
True or False.
-
case
Meaning case sensitive. The case can be either True or False.
-
exact
Meaning exact match. This can be either True or False (for fuzzy match).
-
note
Only the target note will be displayed in the editor.
-
usage
Only the target usage will be displayed in the editor.
-
POS
Part of Speech; values can be Adjective, Noun, Verb, or Adverb.
-
gender
Values can be Masculine, Feminine or Neutral.
-
number
Values can be Singular, Plural or Uncountable.
-
short_translation
-
term_type
Values can be Full_form, Short_form, Acronym, Abbreviation, Phrase, or Variant.
-
created_by
Only Phrase usernames are supported
-
created_at
Date and time of the term creation
-
modified_by
Only Phrase usernames are supported
-
modified_at
Date and time of the last modification of the term
.TBX Files
-
<descrip type="conceptId">
Phrase Concept ID (needed for reimporting updated terms). The Term concept includes the source and all its targets and synonyms.
-
<descrip type="conceptDefinition">
-
<descrip type="conceptDomain">
-
<descrip type="conceptNote">
-
<descrip type="conceptSubdomain">
-
<descrip type="conceptUrl">
-
<langSet xml:lang="cs">
A term's language code based on supported languages.
-
<termNote type="termId">
Phrase Term ID (needed for reimporting updated terms). This is the ID of the specific term in the specific language.
-
<note>
Term's note
-
<termNote type="partOfSpeech">
-
<termNote type="grammaticalGender">
-
<termNote type="grammaticalNumber">
-
<termNote type="usageNote">
-
<termNote type="forbidden">
True or False
-
<termNote type="preferred">
True or False
-
<termNote type="exactMatch">
True or False
-
<termNote type="status">
New or Approved
-
<termNote type="caseSensitive">
True or False
-
<termNote type="createdBy">
Phrase username
-
<termNote type="createdAt">
Unix time
-
<termNote type="lastModifiedBy">
Phrase username
-
<termNote type="lastModifiedAt">
Unix time
-
<termNote type="shortTranslation">
-
<termNote type="termType">
.XLSX files must be formatted in specific manner before being imported.
To prepare the file, follow these steps:
-
In the .XLSX file, organize all terms into columns with each column representing one language.
-
In the first row, apply the language code for each language.
Example:
-
Save the file.
Synonyms
Synonyms can be accommodated by adding a second column with the same language code.
Example:
Terms with attributes
Terms can be imported with specified attributes. Some are generated by Phrase and are available only in files exported from a Phrase TB.
To apply an attribute to a term, follow these steps:
-
Place a column with the attribute name after each term or synonym column.
-
Place the value of the attribute in the row with the associated term.
Terms with complex morphology
Terms that are being imported follow the same morphology rules as terms created directly in a term base.
Apart from working with synonyms and Fuzzy/Exact matches, a pipe character can be added as a boundary between the word stem (the part that does not change) and the suffix (the part that does change).
Example:
The .TBX format is supported for terminology imports (and exports). The .TBX standard is considered a loose standard. If a .TBX file is imported from another CAT tool, some metadata may not get imported.
If importing terminology between two term bases, use the .TBX format. Inside the Phrase environment, data will be correctly imported.
SDL Trados uses a special TBX.XML format and it has different specifications for import.
Multiterm .TBX
The import process from Multiterm .TBX files has been optimized and the following metadata will be imported:
-
Timestamps (created at, last modified at)
-
Value in element
<descrip type="usageNote">
to the attribute usage of all the terms of the concept -
Value in element
<descrip type="note">
to the attribute note of all the terms of the concept
SDL Trados does not support the .TBX format for term bases and uses the .XML format with a TBX schema. Importing this .XML format is supported but not with all attributes.
Attributes specified for the whole term concept will be added to every individual term's Note (each language, each synonym, etc.)
Imported attributes:
-
Source
-
Target
-
Synonyms
-
Date of Creation
-
Date of Modification
-
Names of Author and Reviewer
These will be imported only if the name is the same as the username of an existing Phrase user. Either edit the names in the TBX.xml or add the users to Phrase.
-
Customized Attributes
These will be imported into the term’s Note. Every attribute will have a separate line starting with the attribute’s name. For example:
-
Origin: Wikipedia
-
Theme: Law
-
Status: New
-
Edit the TBX.xml before import
To make the best use of your data, edit the TBX.xml file before importing it. To edit the file, open it in a text editor that supports multiline regex (such as Notepad++) and that can use regular expressions in Search and Replace features.
Editing note, usage and status
Customized attributes in TBX.xml files have the following format. Actual names of the attributes will be different since they are not standardized:
<descripGrp> <descrip type="Comment">term =API= should not be translated</descrip> </descripGrp> <descripGrp> <descrip type="Definition">API = application programming interface</descrip> </descripGrp> <descripGrp> <descrip type="Example">Phrase offers a set of API calls.</descrip> </descripGrp> <descripGrp> <descrip type="Status">confirmed</descrip> </descripGrp>
These attributes will be automatically imported into the Note:
-
Comment: term =API= should not be translated
-
Definition: API = application programming interface
-
Example: Phrase offers a set of API calls
-
Status: confirmed
To change this behavior and import, for example:
-
Only the Comment as a Note
-
Example as Usage
-
Status as Approved or New
-
Don't require import of Definition
Edit the TBX.xml file to fit the standard of the Phrase format for .TBX files:
<note>term =API= should not be translated</note> <termNote type="usageNote">Phrase offers a set of API calls.</termNote> <termNote type="status">Approved</termNote>
Changing Comment to Note
Search:
<descripGrp>.[^\<]+<descrip type="Comment">([^\<]+)</descrip>.[^\<]+</descripGrp>
Replace:
<note>\1</note>
Changing Example to Usage
Search:
<descripGrp>.[^\<]+<descrip type="Example">([^\<]+)</descrip>.[^\<]+</descripGrp>
Replace:
<termNote type="usageNote">\1</termNote>
Setting Status to Approved
Search:
<descripGrp>.[^\<]+<descrip type="Status">[^\<]+</descrip>.[^\<]+</descripGrp>
Replace:
<termNote type="status">Approved</termNote>
Deleting Definition
<descripGrp>.[^\<]+<descrip type="Definition">([^\<]+)</descrip>.[^\<]+</descripGrp>
Replace with an empty field.
Adding an author to note
Remove the author from the <transacGrp / origination>
element and add it to the <descript>
element.
<transacGrp> <transac type="terminologyManagementTransactions">origination</transac> <date>2006-09-27T11:25:19</date> <transacNote type="responsibility">MikeS</transacNote> </transacGrp>
should be replaced by:
<transacGrp> <transac type="terminologyManagementTransactions">origination</transac> <date>2006-09-27T11:25:19</date> </transacGrp> <descripGrp> <descrip type="author">MikeS</descrip> </descripGrp>
The regular expression will be:
Search:
(origination</transac>.[^\<]+<date>[^\<]+</date>.[^\<]+)<transacNote type="responsibility">([^\<]+)</transacNote>.[^\<]+</transacGrp>
Replace:
\1</transacGrp>\r\n<descripGrp>\r\n<descrip type="author">\2</descrip>\r\n</descripGrp>
Adding edited by to a note
To add Edited by to a Note, remove the Editor from the <transacGrp / modification>
element and add it to the <descript>
element.
<transacGrp> <transac type="terminologyManagementTransactions">modification</transac> <date>2006-09-27T11:25:19</date> <transacNote type="responsibility">lauraB</transacNote> </transacGrp>
should be replaced by:
<transacGrp> <transac type="terminologyManagementTransactions">modification</transac> <date>2006-09-27T11:25:19</date> </transacGrp> <descripGrp> <descrip type="Edited by">lauraB</descrip> </descripGrp>
The regular expression will be:
Search:
(modification</transac>.[^\<]+<date>[^\<]+</date>.[^\<]+)<transacNote type="responsibility">([^\<]+)</transacNote>.[^\<]+</transacGrp>
Replace:
\1</transacGrp>\r\n<descripGrp>\r\n<descrip type="edited by">\2</descrip>\r\n</descripGrp>
If the administrator has provided rights, terms can be exported to an .XLSX file for modification before being imported back. This can be used for bulk changes or deletions. Term bases can only be exported one by one.
The following procedure requires that the .XLSX file is imported into the same term base it was exported from. If imported into a different term base, the terms will be duplicated instead of being updated and terms marked for deletion will not be removed.
It is recommended a secondary export is made in .TBX format as a backup in the case a modified import was incorrect.
To modify terms externally, follow these steps:
-
From a term base page, click Export.
The
window opens. -
Select XLSX as the .
-
Select term attributes for export.
-
Click Export.
The .XLSX file is created and downloaded to the system.
-
Modify the .XLSX file with required changes without deleting CID or TID information.
Metadata (creation date, modification date, etc.) can be used to filter the terms in the spreadsheet application to only display a specific group.
-
Updating
To update a term, rewrite the existing term in the column for the given language.
Unlike translation memories, the
|update
suffix is not required but will work correctly if added to a CID or TID.New terms can be added to the column for the given language as additional rows and will be imported as new to the existing term base.
-
Deleting
Terms can be deleted by:
-
Adding |delete as a suffix to a CID of a term to remove the term from all languages.
-
Adding |delete as a suffix to a TID of a term to delete the term from a specific language.
Example of terms being set for deletion in all languages (CID column) and a specific language (TID column):
-
-
-
Save the .XLSX file and import it back with the Update existing terms option.