.XML - Extensible Markup Language (TMS)

Content is machine translated from English by Phrase Language AI.

The .XML file format is not designed for translation and requires additional settings for successful import.

Default settings are marked with an asterisk (*) and will import all XML elements for translation. Import options can be used to change the import behavior.

AI services can be very effective at identifying format problems in .XML based files.

File Types

.XML

Import Options

Plain import rules

Elements

Only selected elements (i.e. name, title, para) are imported. An asterisk (*) imports all elements.
Attributes

Only selected attributes (i.e. name, title, para) are imported. An asterisk (*) imports all attributes.
Translatable inline elements

If the Identify inline elements automatically option is selected, all elements in the translatable text are imported as Translatable inline elements.
Non-translatable inline elements

Selected inline element name, title, para will be converted into tags and content will not be translatable.

Important

Issues with tags are a common cause of export errors (e.g. File couldn't be generated), especially for file types such as spreadsheets (MS Excel based) and .XML. Always ensure tags and formatting are correct before exporting files by running quality assurance checks.
Identify inline elements automatically

Elements that are neighbors of text nodes will be automatically converted to inline tags.
Elements (processed as HTML)

Selected element code is processed as .HTML. .HTML import settings such as Preserve Whitespaces or Break tag (<br/>) creates new segment can be used for these elements.

Use this option when the selected element value contains .HTML markup. It does not apply to children of the selected element, unless otherwise specified.
Locked elements

The selected elements will be imported as Locked.
Locked attributes

The selected attributes will be imported as Locked.
Convert to character entities

Enter a list of character references (separated by commas) into the output file.

Example:

If quotation marks (") are required, they would be represented as ", the character Σ would be represented as Σ use ",Σ . & and < are always exported as & and < respectively.
Convert to Phrase TMS tags

Apply regular expressions to convert specified text to tags.
Parse ICU messages

ICU messages are automatically converted to tags. When a segment contains inline elements, ICU parsing for that segment is skipped. Segments without inline elements are parsed normally.
Import XML entities

XML entities in DTD Declaration will be imported for translation.
Expand custom general entities
Import comments

Comments are not imported if elements are processed as HTML as indicated in the Elements (processed as HTML) option.
Exclude subelements from segmentation

Select to prevent segmentation inside XML pair tags or subelements. This is useful if the XML contains nested structures where segmentation would break the logical meaning of the text.
Create XSLT preview file

An .XSL stylesheet can be uploaded and have a readable preview file generated from it.

XML settings using XPath

Using the XPath query language allows for the creation of complex import rules and some additional features unavailable in plain import rules.

XPath expression should define the elements and/or attributes whose text/value should be translated and not the actual text node.

Familiarity with XPath is recommended before using.

Context note, Context key, and Max. target length will not be processed for files with more than 10,000 XML elements.

Context key

Constitutes TM context (101% matches) if applicable.
Context note

Import elements or context attributes for each element.
Max. target length

Import elements or the maximum target length for each element. The character limit for each segment is displayed on the Context note pane inside the editor. Any character exceeding the limit is highlighted in red.
Preserve whitespaces

Keep empty to preserve whitespaces in elements. Apply xml:whitespace='preserve'. //* to preserve all whitespaces in all elements, or use an arbitrary XPath expression.
Nodes excluded from segmentation

Specify XML elements or attributes that should not be segmented. Enter an XPath expression that identifies the nodes to be excluded. Any text extracted from these nodes will be kept as a single segment rather than split into smaller units.

Enter //element[@attr='value'] to exclude all <element> nodes that contain the attribute attr="value" from segmentation.

HTML preview with XSLT stylesheet

XSLT language (Extensible Stylesheet Language Transformations) can be used to transform .XML documents into .HTML format for in-context preview purposes. Accordingly, preview files downloaded via Preview translation in the Document menu come with HTML extension. Phrase currently supports XSLT 2.0.

XSLT used for preview must be based on target not source.

Click Choose file to import a stylesheet.

Click Download XSLT to download the stylesheet after file import.

CDATA in XML file

CDATA means Character Data and is defined as blocks of text that are not processed by the parser but are recognized as markup. Predefined entities such as <, >, and & require typing and are generally difficult to read in the markup. In such cases, the CDATA section can be used.

If CDATA contains embedded .HTML, the corresponding XML elements should be listed under Elements (processed as HTML).

If the source file contains CDATA and the Segment XML is used then CDATA is added to every segment in the Completed file.

CDATA will only be segmented if there is a clear indication of a segment break such as punctuation or spacing.

Source:

<text><![CDATA[Translatable text A. Translatable text B.]]></text>

Target:

<text><![CDATA[Translatable text A.]]><![CDATA[ ]]><![CDATA[Translatable text B.]]></text>

The Completed file is valid .XML and the XML viewer will display the text correctly as Translatable text A. Translatable text B.

Application Specific Settings

Wordpress XML

Recommended settings for Wordpress XML:

XML

XPath
Elements & attributes

//*[local-name()='encoded']|//description|//title
Elements (processed as HTML)

//*[local-name()='encoded']|//description|//title
Convert to Phrase tags

(\[[^\]]++\])++

Select Preserve whitespaces under HTML settings.

Multilingual XML

Multilingual files are imported as multiple bilingual jobs with languages mapped before import. They are represented with in the jobs table. If imported into several target languages, the Completed file is composed of all target languages.

Phrase supports XML files that have both source and target elements present for all paragraphs even if the target is empty. When the source and target segmentation are different, the source segmentation is determining.

Individual language elements must all be descendants of the same trans-unit element and one language cannot be contained within the other. Source and target content cannot be stored in attribute values. If multiple elements match the XPath for source or target inside the trans-unit element, only the first one is imported for translation.

When creating a job, select Multilingual XML from the File Type pane before applying Import Options. If not specified, the file will be imported as standard .XML.
Tag content of source .XML file can be visualized in the editor by clicking Expand tags under the Tool menu and edited by clicking F2.

Example:

Sample of partially translated text from English to German and French. All <tuv lang="en">, <tuv lang="de"> and <tuv lang="fr"> are children of the same <tu> element.

<?xml version="1.0" encoding="utf-8"?>
<root>
Not translatable text.
<tu note="context note" key="ID 254" maxlen="16"> 
  <tuv lang="en">
    <seg>First segment.</seg>
  </tuv>
  <tuv lang="de">
    <seg>Erste segment</seg>
  </tuv>
  <tuv lang="fr">
    <seg></seg>
  </tuv>
</tu>
<tu note="another context note" key="ID 255" maxlen="18"> 
  <tuv lang="en">
    <seg>Second segment.</seg>
  </tuv>
  <tuv lang="de">
    <seg></seg>
  </tuv>
  <tuv lang="fr">
    <seg></seg>
  </tuv>
</tu>
</root>

Import Options

For the import of Multilingual .XML files, the XPath query language must be used. See example above for reference. The XPath expression defines the elements in which the text/value should be translated and not the actual text node.

Elements containing source and target sub-elements

//tu
Elements containing source text

tuv[@lang='en']/seg (in relation to the parent element //tu)
Non-translatable inline elements

All elements in source or target are considered Translatable inline elements unless specified here as Non-translatable inline elements.
Context key

Specify a context key that is saved with the segment to the translation memory and used for match context.
Context note

Import elements or context attributes for each element.
Max. target length

Import elements or the maximum target length for each element
Convert to character entities

Enter a list of character references (separated by commas) into the output file.

Example:

If quotation marks (") are required, they would be represented as ", the character Σ would be represented as Σ use ",Σ . & and < are always exported as & and < respectively.
Convert to Phrase TMS tags

Apply regular expressions to convert specified text to tags.
Parse ICU messages

ICU messages are automatically converted to tags. When a segment contains inline elements, ICU parsing for that segment is skipped. Segments without inline elements are parsed normally.
Use HTML subfilter

Imports HTML tags contained in the file. Tags can then be used with HTML File Import Settings. Paragraph tags <p> will create new segments even if Segment Multilingual XML is unselected.
Use HTML subfilter

Imports HTML tags contained in the file. Tags can then be used with HTML File Import Settings.
Segment multilingual XML

Text is segmented by a general segmentation rule rather than one segment per cell.

Caution

Applying Segment multilingual XML to a file that contains target text may result in a different number of segments in the source than in the target.
Set segment status of non-empty target

Select default confirmation status and whether confirmed segments are automatically added to TM.
Create XSLT preview file

An .XSL stylesheet can be uploaded and have a readable preview file generated from it.

Example:

If a multilingual .XML contains namespace, the XPath could be the following:

Elements containing source and target sub-elements

//*[local-name()='trans-unit']
Elements containing source text

*[local-name()='source']
Elements containing target text

*[local-name()='target']

.XML - Extensible Markup Language (TMS)

Content is machine translated from English by Phrase Language AI.

File Types

Import Options

Important

HTML preview with XSLT stylesheet

CDATA in XML file

Application Specific Settings

Wordpress XML

Multilingual XML

Import Options

Caution