CAT Editors

Regex in Web Editor (TMS)

Content is machine translated from English by Phrase Language AI.

Regex support in the web editor is limited by the implementation of the Lucene Regex engine.

To use regex, enabled Match using regex in filter settings. A green checkmark in the filter input field indicates the successful validation of a valid regex.

Queries are by default case insensitive. Enable Match case in filter settings to make them case sensitive.

Match words (ensuring matching only complete words and not substrings within longer words) is not available.

The query ^abc$ will work as expected, i.e. matching the whole segment, while abc will match substring abc in any text.

Limitations

The correct replacement of partially formatted queries is not supported, e.g. searching for "Jméno: $1, Příjmení: $2." in text "{b>First name<}: Bob, {biu>Last name<biu}: Dylan." will be replaced as “{>Jméno: Bob, Příjmení: Dylan<b}”

Unsupported patterns

  • Word boundary anchor \b used to match exact words (works in the desktop editor).

  • \\[1-9] - Backreferences (\1, \2, etc.), e.g. (\w+)\s+\1 to match duplicated words like “hello hello”

  • \(\?=|\(\?!|\(\?&lt;=|\(\?&lt;! - Lookahead and lookbehind, e.g. cat(?=\.jpg) to match “cat” only in “cat.jpg”

  • \(\?: - Non-capturing groups, e.g. (?:Mr|Mrs|Ms)\. \w+, but capturing groups (Mr|Mrs|Ms)\. \w+ are supported and match names like “Mrs. Smith”, “Mr. Brown”

  • \(\?# - Inline comments, e.g. \d{4}-(?# year)\d{2}-(?# month)\d{2}(?# day) to match  “2025-06-25”

  • \(\?P<[^>]+> - Named capture groups, e.g. (?P<amount>\d+)\s?(?P<currency>USD|EUR) to match “150 USD” and “99 EUR”

Usage

Basic Pattern Matching

Dot (.) as a placeholder for any single character

  • c.at: Matches: “chat”, “coat”. Does NOT match: “cat”, “cheat”

  • wa.ter: Matches: “waiter”, “waster”. Does NOT match: “water”

  • s.ip: Matches: “skip”, “ship”, “slip”. Does NOT match: “sip”, “strip”

Quantifiers

? - Zero or one occurrence

  • colou?r: Matches: “color”, “colour”

  • g?rain: Matches: “grain”, “rain”

  • books?: Matches: “book”, “books”

.* - Any number of characters (including none)

  • h.*y: Matches: “happy”, “history”, “honey”

  • sa.*d: Matches: “sad”, “sand”, “satisfied”

  • m.*ing: Matches: “morning”, “meeting”, “marketing”

.+ - At least one character must appear

  • pa.+er: Matches: “paper”, “painter”

  • a.+ed" Matches: “asked”, “accepted”, “allowed”

* - Zero or more occurrences

  • go*al: Matches: “goal”, “goooooooal”

+ - One or more occurrences

  • no+: Matches: “no”, “noooooo”

  • $1+: Matches: “$1”, “$11”, “$111”

It is recommended to use as specific a pattern as possible as open patterns may cause performance issues in the editor.

Important

The editor has a built-in limit on how complex a regex pattern can be. Patterns that are too broad or heavily use wildcards may fail as invalid regex. To avoid this:

  • Keep patterns short and specific. error-[0-9]{3} is fine; .*a.*b.*c.*d.* is not.

  • Minimize wildcards. Each .+ or .* multiplies the internal complexity. Prefer character classes like [A-Z]+ over .* where possible.

  • Avoid long alternations with repetition. A pattern like (word1|word2|...|word20){2,} can exceed the limit quickly. This is amplified when the alternatives include multiple words or punctuation, which adds to the regex complexity.

  • Anchor one side when possible. ^prefix.* is far cheaper than .*middle.*.

If the pattern is rejected, try making it more targeted: start with a longer fixed prefix and narrow down from there. Consider reducing the number of alternatives or handling them in separate patterns instead of combining everything into one complex regex.

Example:

Filtering for email addresses:

  • This pattern will match spaces and all surrounding words with potentially too many results: .*@.*

  • To limit the results to all email addresses: [\w.+\-]+@[\w.+\-]+

  • To limit the results to .com emails only:  [\w.+\-]+@[\w.+\-]\.com

  • To limit the results to those having a digit in the email addresses: [\w.+\-]*\d+[\w.+\-]*@[\w.\-]+

Alternations (OR operator)

  • cat|dog: Matches: “cat” and “dog”

  • red|blue|green: Matches: “red”, “blue”, “green”

Character Classes and Ranges

  • [A-Z]+: Matches one or more uppercase letters in a row (a sequence).

  • [A-Z]{2,}: Matches any series of uppercase letters (useful for e.g. matching acronyms or strings written in uppercase letters)

  • [0-9]{4}: Matches four-digit numbers, e.g. "1999", "2003", "1876" (also found within a longer-than-four digit string; to limit the results, the Match words option planned for future should be used)

  • [A-Za-z0-9]+: Matches any alphanumeric string (hello! → hello would match, but ! is not part of [A-Za-z0-9]; 100% → only 100 would match)

  • ([A-Za-z]+\d+|\d+[A-Za-z]+): Matches strictly a combination of digits and letters, e.g. “user123”, “Admin99”, “Win11”, “5g”, “1080p”

  • [0-9]{2,4}-[A-Z]{2,3}: Matches license plates, e.g. “12-XY”, “9999-ABC”

Escaping Reserved Characters . ? * { } [ ] ( ) " \

  • \+[0-9]{1,2}: Matches “+40”, “+1”

  • \{version: [0-9]+\}: Matches “{version: 12}”, “{version: 13}”

  • C:\\[A-Za-z]+: Matches “C:\Users”, C:\Documents, “C:\Desktop”

Case-insensitive VS Case-sensitive Filtering

  • By default regex filtering will be implemented as case-insensitive. c.at: Matches: “chat”, “Chat”, “CHAT” and “coat”, “Coat”, “COAT”

  • regex filtering can be combined with Case sensitive UI filter

Capturing Groups

Regex capturing groups are recognized and the full query is highlighted, e.g. s(e)g will highlight "seg". Capturing groups can be used for replacement, e.g.”Name: Bob” can be searched by Name: (.*?) and replaced by using a backreference to Jméno: $1”. Missing backreferences are handled graciously, i.e. query Name: (.*?) with replacement Jméno: $1, Title: $2 will be replaced as Jméno: Bob, Title: $2.

Examples for replace backreference:

  • filter (\d+),(\d+) and replace $1.$2 to normalize decimal separators (e.g. from 5,6 or 35,949 to 5.6 or 35.949)

  • filter (\d+)\.(\d+) and replace $1,$2 to normalize decimal separators (e.g. from 5.6 or 35.949 to 5,6 or 35,949)

  • filter (\d{4})-(\d{2})-(\d{2}) and replace $3/$2/$1 to reformat date (e.g. from 2025-06-05 to 05/06/2025)

  • filter ID-(\d{3,}) and replace Ticket #$1 to extract number of the ticket (e.g. from ID-45321 to Ticket #45321)

  • filter (cat|dog) and replace $1-$1 to duplicate matched text (e.g. from cat to cat-cat and from dog to dog-dog)

  • Optional group: filter Hello(, (\w+))? and replace Hi $1 to transform greetings by replacing "Hello" before names or on its own (e.g. from Hello, John to Hi John and from Hello to Hi)

Was this article helpful?

Sorry about that! In what way was it not helpful?

The article didn’t address my problem.
I couldn’t understand the article.
The feature doesn’t do what I need.
Other reason.

Note that feedback is provided anonymously so we aren't able to reply to questions.
If you'd like to ask a question, submit a request to our Support team.
Thank you for your feedback.