File Import Settings

.XML - Extensible Markup Language (TMS)

Content is machine translated from English by Phrase Language AI.

The .XML file format is not designed for translation and requires additional settings for successful import.

Default settings are marked with an asterisk (*) and will import all XML elements for translation. Import options can be used to change the import behavior.

AI chatbots can be very effective at identifying format problems in .XML based files.

File Types

  • .XML

Import Options

Plain import rules

  • Elements

    Only selected elements (i.e. name, title, para) are imported. An asterisk (*) imports all elements.

  • Attributes

    Only selected attributes (i.e. name, title, para) are imported. An asterisk (*) imports all attributes.

  • Translatable inline elements

    If the Identify inline elements automatically option is selected, all elements in the translatable text are imported as Translatable inline elements.

  • Non-translatable inline elements

    Selected inline element name, title, para will be converted into tags and content will not be translatable.

  • Identify inline elements automatically

    Elements that are neighbors of text nodes will be automatically converted to inline tags.

  • Elements (processed as HTML)

    Selected element code is processed as .HTML. .HTML import settings such as Preserve Whitespaces or Break tag (<br/>) creates new segment can be used for these elements.

    Use this option when the selected element value contains .HTML markup. It does not apply to children of the selected element, unless otherwise specified.

  • Locked elements

    The selected elements will be imported as Locked.

  • Locked attributes

    The selected attributes will be imported as Locked.

  • Parse ICU messages

    ICU messages are automatically converted to tags. Files with ICU messages cannot contain any inline elements.

  • Import XML entities

    XML entities in DTD Declaration will be imported for translation.

  • Segment XML

    Deselect if segmentation is not desired.

  • Import comments

    Comments are not imported if elements are processed as HTML as indicated in the Elements (processed as HTML) option.

  • Convert to Phrase TMS tags 

    Apply regular expressions to convert specified text to tags.

  • Convert to character entities

    Enter a list of character references (separated by commas) into the output file.

    Example:

    If quotation marks (") are required, they would be represented as &quot;, the character Σ would be represented as &#x3A3; use &quot;,&#x3A3; . & and < are always exported as &amp; and &lt; respectively.

XML settings using XPath

Using the XPath query language allows for the creation of complex import rules and some additional features unavailable in plain import rules.

XPath expression should define the elements and/or attributes whose text/value should be translated and not the actual text node.

Familiarity with XPath is recommended before using.

Context note, Context key, and Max. target length will not be processed for files with more than 10,000 XML elements.

  • Context key

    Constitutes TM context (101% matches) if applicable.

  • Context note

    Import elements or context attributes for each element.

  • Max. target length

    Import elements or the maximum target length for each element. The character limit for each segment is displayed on the Context note pane inside the editor. Any character exceeding the limit is highlighted in red.

  • Preserve whitespaces

    Keep empty to preserve whitespaces in elements. Apply xml:whitespace='preserve'. //* to preserve all whitespaces in all elements, or use an arbitrary XPath expression.

HTML preview with XSLT stylesheet

XSLT language (Extensible Stylesheet Language Transformations) can be used to transform .XML documents into .HTML format for in-context preview purposes. Accordingly, preview files downloaded via Preview translation in the Document menu come with HTML extension. Phrase currently supports XSLT 2.0.

Click Choose file to import a stylesheet.

Click Download XSLT to download the stylesheet after file import.

CDATA in XML file

CDATA means Character Data and is defined as blocks of text that are not processed by the parser but are recognized as markup. Predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup. In such cases, the CDATA section can be used.

If CDATA contains embedded .HTML, the corresponding XML elements should be listed under Elements (processed as HTML).

If the source file contains CDATA and the Segment XML is used then CDATA is added to every segment in the Completed file.

CDATA will only be segmented if there is a clear indication of a segment break such as punctuation or spacing.

Source:

<text><![CDATA[Translatable text A. Translatable text B.]]></text>

Target:

<text><![CDATA[Translatable text A.]]><![CDATA[ ]]><![CDATA[Translatable text B.]]></text>

The Completed file is valid .XML and the XML viewer will display the text correctly as Translatable text A. Translatable text B.

Application Specific Settings

Wordpress XML

Recommended settings for Wordpress XML:

  • XML

    XPath

  • Elements & attributes

    //*[local-name()='encoded']|//description|//title

  • Elements (processed as HTML)

    //*[local-name()='encoded']|//description|//title

  • Convert to Phrase tags

    (\[[^\]]++\])++

Select Preserve whitespaces under HTML settings.

Multilingual XML

Multilingual files are imported as multiple bilingual jobs with languages mapped before import. They are represented with multilingual_xml.png in the jobs table. If imported into several target languages, the Completed file is composed of all target languages.

Phrase supports XML files that have both source and target elements present for all paragraphs even if the target is empty. When the source and target segmentation are different, the source segmentation is determining.

Individual language elements must all be descendants of the same trans-unit element and one language cannot be contained within the other. Source and target content cannot be stored in attribute values. If multiple elements match the XPath for source or target inside the trans-unit element, only the first one is imported for translation.

  • When creating a job, select Multilingual XML from the File Type pane before applying Import Options. If not specified, the file will be imported as standard .XML.

  • Tag content of source .XML file can be visualized in the editor by clicking Expand tags under the Tool menu and edited by clicking F2.

Example:

Sample of partially translated text from English to German and French. All <tuv lang="en">, <tuv lang="de"> and <tuv lang="fr"> are children of the same <tu> element.

<?xml version="1.0" encoding="utf-8"?>
<root>
Not translatable text.
<tu note="context note" key="ID 254" maxlen="16"> 
  <tuv lang="en">
    <seg>First segment.</seg>
  </tuv>
  <tuv lang="de">
    <seg>Erste segment</seg>
  </tuv>
  <tuv lang="fr">
    <seg></seg>
  </tuv>
</tu>
<tu note="another context note" key="ID 255" maxlen="18"> 
  <tuv lang="en">
    <seg>Second segment.</seg>
  </tuv>
  <tuv lang="de">
    <seg></seg>
  </tuv>
  <tuv lang="fr">
    <seg></seg>
  </tuv>
</tu>
</root>

Import Options

For the import of Multilingual .XML files, the XPath query language must be used. See example above for reference. The XPath expression defines the elements in which the text/value should be translated and not the actual text node.

  • Elements containing source and target sub-elements

    //tu

  • Elements containing source text

    tuv[@lang='en']/seg (in relation to the parent element //tu)

  • Elements containing target text

    tuv[@lang='de']/seg (in relation to the paContext note rent element //tu)

  • Elements containing target text

    tuv[@lang='fr']/seg (in relation to the parent element //tu)

  • Non-translatable inline elements

    All elements in source or target are considered Translatable inline elements unless specified here as Non-translatable inline elements.

  • Convert to Phrase TMS tags 

    Apply regular expressions to convert specified text to tags.

  • Context key

    Specify a context key that is saved with the segment to the translation memory and used for match context.

  • Context note

    Import elements or context attributes for each element.

  • Max. target length

    Import elements or the maximum target length for each element

  • Convert to character entities

    Enter a list of character references (separated by commas) into the output file.

    Example:

    If quotation marks (") are required, they would be represented as &quot;, the character Σ would be represented as &#x3A3; use &quot;,&#x3A3; . & and < are always exported as &amp; and &lt; respectively.

  • Parse ICU messages

    ICU messages are automatically converted to tags. Files with ICU messages cannot contain any inline elements.

  • Use HTML subfilter 

    Imports HTML tags contained in the file. Tags can then be used with HTML File Import Settings. Paragraph tags <p> will create new segments even if Segment Multilingual XML is unselected.

  • Segment multilingual XML

    Text is segmented by a general segmentation rule rather than one segment per cell.

    Caution

    Applying Segment multilingual XML to a file that contains target text may result in a different number of segments in the source than in the target.

  • Set segment status of non-empty target 

    Select default confirmation status and whether confirmed segments are automatically added to TM.

Example:

If a multilingual .XML contains namespace, the XPath could be the following:

  • Elements containing source and target sub-elements

    //*[local-name()='trans-unit']

  • Elements containing source text

    *[local-name()='source']

  • Elements containing target text

    *[local-name()='target']

Was this article helpful?

Sorry about that! In what way was it not helpful?

The article didn’t address my problem.
I couldn’t understand the article.
The feature doesn’t do what I need.
Other reason.

Note that feedback is provided anonymously so we aren't able to reply to questions.
If you'd like to ask a question, submit a request to our Support team.
Thank you for your feedback.