Import Utilities

Regexp (TMS)

The regular expression (abbreviated as regex or regexp) is a sequence of characters that form a search pattern mainly for use in pattern-matching with strings or string-matching. Functionality is similar to find and replace operations with more complexity and specificity. See the wikipedia entry for a detailed description of regexp and a table of used characters.

To use multiple regexps at a time, insert a pipeline character | between them.

Regexps can be used in the filter, search and replace fields in the CAT desktop editor, in the source and target fields of the Search for content feature, for the Convert to tags feature in File import settings and for customizing segmentation rules.

Important

Phrase supports Java regexp, but will reject complex regular expressions to protect the system from overload. Complex regexps are those with quantifiers (except possessives) on groups which contain other quantifiers (except possessives).

General Examples

Examples for converting text into tags when importing files and using regexp in the editor for search and replace functions:

Example

Description

<[^>]+>

represents <html_tag>

\{[^\}]+\}

represents {variable},

\[[^\]]+\]

represents [variable],

\[\[.+?\]\]

represents [[aa[11]bb]].

\$[^\$]+\$

represents $operator_Name1$.

\d+

represents numbers. Also, [0-9]+

[A-Za-z0-9]

represents any alphanumeric character.

.+\@.+\..+

email address name@domain.com

\d{4}[-]\d{2}[-]\d{2}

the date 2018-08-01

\s$

a whitespace at the end of the segment

^\s

a whitespace at the beginning of the segment

\s\s

a double whitespace

^\d

a digit at the beginning of the segment

\w+\s\s\w+

a double whitespace between words

\s\n

a newline preceded by any whitespace character

\S\n

a newline preceded by any non-whitespace character

<[^>]+>|\$[^=]+=

converts php variables and html code ($svariable['name'] =)

^\s*\'[^:]+:

converts javascript's field key with added whitespaces at the beginning of the line ( 'key' :)

\{\{[^\}]+\}\}|\'[^']+\'

does not translate {{text here}} '{{text here}} content and converts it to tags

TXT Import

Examples of regular expressions when importing a specific text:

  1. ## ErrorMessage ##1## The number must be higher than 0. ##Z##

    To import text between ##1## and ##Z## ,use regexp: (?<=##1## ).*(?= ##Z##)

  2. ErrorMessage ("The number must be higher than 0.")

    To import text between (" and ") , use regexp: (?<=\(").*(?="\))

  3. 'errorMessage' = 'The number must be higher than 0.'

    To import text after the = sign and between ' and ' , use regexp: (?<=\= ').*(?=')

  4. errorMessage = "this is to be translated"

    To import text after the = sign and between 'and' use regexp: (?<=\= ").*(?=")

  5. msgstr ("The number must be higher than 0.")

    To import msgstr strings in monolingual PO files using a TXT filter, use regexp: (?<=msgstr ").*(?=")

  6. # Note: This is a note

    To exclude lines starting with # , use regexp: (^[^#].*)

  7. values '126', 'DCeT', 'Text (en)'

    To import only text in quotes and with (en), such as Text (en)' use regexp: (?<=')[^']*\(en\)(?=')

JSON Import

JSON structure example:

{
"list": {
        "id": "1",
        "value": "text 1 for translation."
        },
"text": {
        "id": "2",
        "value": "text 2 for translation."
        },
"menu": {
        "id": "3",
        "value": "text 3 for translation."
         },"array": ["blue","green"],"arrays": [{        "color": "blue",        "title": "BLUE"
         },         {        "color": "green",        "title": "GREEN"         }    ]}
  • for importing every value regardless of the level, use: (^|.*/)value

  • for importing only one value from a list, use: list/value

  • for importing a value from a list and/or menu, use the | (OR) operator: list/value|menu/value

  • for importing only the first instance of a value from a menu, use: menu\[1\]/value

  • for importing the content of a JSON array following a certain key, use: (^|.*/)array\[.*\]

  • to import the content of a specific array of objects, use: (^|.*/)arrays\[.*\].*

YAML Import

YAML flie example:

title: A
text: translate A
categories:
  title: B
  text: translate B
categories:
  title: C
  text: translate C
categories:
  content:
      title: D
      text: translate D

regexp for importing:

  • only 'translate A' : text

  • only 'translate C': categories\[2\]/text

  • only 'translate D': categories\[\d+\]/content[\1\]/text

  • all text: text|categories\[\d+\]/text|categories\[\d+\]/content[\d+\]/text

Segmentation Rules

Okapi, Java and Unicode are used for segmentation rules in .SRX files.

Using regexp in .SRX files is complex and a basic knowledge of regular expression use is recommended before attempting to work with them.

Nobreak rules (Abbreviations etc.) and Break rules (End of the sentence with a dot, etc) are in .SRX files.

Example

Description

[\p{C}]

Invisible control character.

[\p{Z}]

Whitespace

[\p{Lu}]

An uppercase letter that has a lowercase variant.

[\p{N}]

Any kind of numeric character.

\Q ... \E

Start and end of a quotation - (\QApprox.\E). This is used for Abbreviations.

\t

Tabulator

\n

Newline

\u2029

Paragraph separator

\u200B

Zero-width space

\u3002

Ideographic full stop

\ufe52

Small full stop

\uff0e

Fullwidth full stop

\uff61

Halfwidth ideographic full stop

\ufe56

Small question mark

\uff1f

Fullwidth question mark

\u203c

Double exclamation mark

\u2048

Question exclamation mark

\u2762

Heavy exclamation mark ornament

\u2763

Heavy heart exclamation mark ornament

\ufe57

Small exclamation mark

\uff01

Fullwidth exclamation mark

Was this article helpful?

Sorry about that! In what way was it not helpful?

The article didn’t address my problem.
I couldn’t understand the article.
The feature doesn’t do what I need.
Other reason.

Note that feedback is provided anonymously so we aren't able to reply to questions.
If you'd like to ask a question, submit a request to our Support team.
Thank you for your feedback.