The .XML file format is not designed for translation and requires additional settings for successful import.
Default settings are marked with an asterisk (*) and will import all XML elements for translation. Import options can be used to change the import behavior.
AI chatbots can be very effective at identifying format problems in .XML based files.
File Types
-
.XML
Import Options
Plain import rules
-
Only selected elements (i.e. name, title, para) are imported. An asterisk ( ) imports all elements.
-
Only selected attributes (i.e. name, title, para) are imported. An asterisk ( ) imports all attributes.
-
If the Translatable inline elements.
option is selected, all elements in the translatable text are imported as -
Selected inline element name, title, para will be converted into tags and content will not be translatable.
-
Elements that are neighbors of text nodes will be automatically converted to inline tags.
-
Selected element code is processed as .HTML. .HTML import settings such as Preserve Whitespaces or Break tag (<br/>) creates new segment can be used for these elements.
Use this option when the selected element value contains .HTML markup. It does not apply to children of the selected element, unless otherwise specified.
-
The selected elements will be imported as Locked.
-
The selected attributes will be imported as Locked.
-
ICU messages are automatically converted to tags. Files with ICU messages cannot contain any inline elements.
-
XML entities in DTD Declaration will be imported for translation.
-
Deselect if segmentation is not desired.
-
Comments are not imported if elements are processed as HTML as indicated in the Elements (processed as HTML) option.
-
Apply regular expressions to convert specified text to tags.
-
Enter a list of character references (separated by commas) into the output file.
Example:
XML settings using XPath
Using the XPath query language allows for the creation of complex import rules and some additional features unavailable in plain import rules.
XPath expression should define the elements and/or attributes whose text/value should be translated and not the actual text node.
Familiarity with XPath is recommended before using.
, , and will not be processed for files with more than 10,000 XML elements.
-
Constitutes TM context (101% matches) if applicable.
-
Import elements or context attributes for each element.
-
Import elements or the maximum target length for each element. The character limit for each segment is displayed on the pane inside the editor. Any character exceeding the limit is highlighted in red.
-
Keep empty to preserve whitespaces in elements. Apply xml:whitespace='preserve'. //* to preserve all whitespaces in all elements, or use an arbitrary XPath expression.
HTML preview with XSLT stylesheet
XSLT language (Extensible Stylesheet Language Transformations) can be used to transform .XML documents into .HTML format for in-context preview purposes. Accordingly, preview files downloaded via Preview translation in the Document menu come with HTML extension. Phrase currently supports XSLT 2.0.
Click Choose file to import a stylesheet.
Click Download XSLT to download the stylesheet after file import.
CDATA in XML file
CDATA means Character Data and is defined as blocks of text that are not processed by the parser but are recognized as markup. Predefined entities such as <,
>
, and &
require typing and are generally difficult to read in the markup. In such cases, the CDATA section can be used.
If CDATA contains embedded .HTML, the corresponding XML elements should be listed under
.If the source file contains CDATA and the
is used then CDATA is added to every segment in the Completed file.CDATA will only be segmented if there is a clear indication of a segment break such as punctuation or spacing.
Source:
<text><![CDATA[Translatable text A. Translatable text B.]]></text>
Target:
<text><![CDATA[Translatable text A.]]><![CDATA[ ]]><![CDATA[Translatable text B.]]></text>
The Completed file is valid .XML and the XML viewer will display the text correctly as Translatable text A. Translatable text B.
Multilingual files are imported as multiple bilingual jobs with languages mapped before import. They are represented with in the jobs table. If imported into several target languages, the Completed file is composed of all target languages.
Phrase supports XML files that have both source and target elements present for all paragraphs even if the target is empty. When the source and target segmentation are different, the source segmentation is determining.
Individual language elements must all be descendants of the same trans-unit element and one language cannot be contained within the other. Source and target content cannot be stored in attribute values. If multiple elements match the XPath for source or target inside the trans-unit element, only the first one is imported for translation.
-
When creating a job, select from the pane before applying Import Options. If not specified, the file will be imported as standard .XML.
-
Tag content of source .XML file can be visualized in the editor by clicking Expand tags under the menu and edited by clicking F2.
Example:
Sample of partially translated text from English to German and French. All <tuv lang="en">
, <tuv lang="de">
and <tuv lang="fr">
are children of the same <tu>
element.
<?xml version="1.0" encoding="utf-8"?> <root> Not translatable text. <tu note="context note" key="ID 254" maxlen="16"> <tuv lang="en"> <seg>First segment.</seg> </tuv> <tuv lang="de"> <seg>Erste segment</seg> </tuv> <tuv lang="fr"> <seg></seg> </tuv> </tu> <tu note="another context note" key="ID 255" maxlen="18"> <tuv lang="en"> <seg>Second segment.</seg> </tuv> <tuv lang="de"> <seg></seg> </tuv> <tuv lang="fr"> <seg></seg> </tuv> </tu> </root>
Import Options
For the import of Multilingual .XML files, the XPath query language must be used. See example above for reference. The XPath expression defines the elements in which the text/value should be translated and not the actual text node.
-
//tu
-
tuv[@lang='en']/seg
(in relation to the parent element//tu
) -
tuv[@lang='de']/seg
(in relation to the pa rent element//tu
) -
tuv[@lang='fr']/seg
(in relation to the parent element//tu
) -
All elements in source or target are considered Translatable inline elements unless specified here as Non-translatable inline elements.
-
Apply regular expressions to convert specified text to tags.
-
Specify a context key that is saved with the segment to the translation memory and used for match context.
-
Import elements or context attributes for each element.
-
Import elements or the maximum target length for each element
-
Enter a list of character references (separated by commas) into the output file.
Example:
-
ICU messages are automatically converted to tags. Files with ICU messages cannot contain any inline elements.
-
Imports HTML tags contained in the file. Tags can then be used with HTML File Import Settings. Paragraph tags
<p>
will create new segments even if is unselected. -
Text is segmented by a general segmentation rule rather than one segment per cell.
Caution
Applying
to a file that contains target text may result in a different number of segments in the source than in the target. -
Select default confirmation status and whether confirmed segments are automatically added to TM.
Example: