Regular Expressions

A number of features in your Localize dashboard allow you to enter a Regular Expression or "regex", including:

Examples of Regular Expressions

The following are examples of some common Regular Expressions that you might find in your content.

Item	Regular Expression	Examples
Date With Slashes	`\d{1,2}\/\d{1,2}\/\d{4}`	1/2/2020 12/15/1963
Date with Dashes and text-based 3-letter month	`\d{1,2}-[A-Z]{1}[a-z]{2}-\d{4}`	11-Jan-2014 5-Apr-1963
Date and time	`[a-z,A-Z]{3} \d{1,2}, \d{4} \d{1,2}:\d{2} [A,a,P,p][M,m]`	Aug 9, 2021 10:20 AM Sep 11, 1964 10:20 pm
Short time	`\d{1,2}:\d{2} [A,a,P,p][M,m]`	12:34 PM 5:14 am
Countdown Timer (HH:MM:SS)	`\d{2}:\d{2}:\d{2}`	12:42:22 01:18:33
Brand name or other specific word	Localize	Any phrase whose content contains the string literal "Localize" anywhere in it
Specific ID #s (e.g. US social security #)	`\d{3}:\d{2}:\d{3}`	111-22-333
Specific word: dog	`^dog$`	Any phrase whose content is exactly "dog"

Special Characters in Regular Expressions

Special Characters

Regular expressions use special characters (metacharacters) to denote which characters to match in the source content. You must be careful when adding words/phrases to this list so that they aren't interpreted differently than you expect them to be.

The special characters are:

backslash \
caret ^
dollar sign $
period or dot .
vertical bar or pipe symbol |
question mark ?
asterisk or star *
plus sign +
opening parenthesis (
closing parenthesis )
opening square bracket [
opening curly brace {

Escaping

Any special characters that can be used in regular expressions will need to be "escaped" if you want to match the literal character.

For example: If you want to exclude anything with .tif in the phrase, you would enter: \.tif

The backslash tells the parser to consider the "." as a literal character.
Without the backslash, the .tif would include any phrase that has at least one character before tif, e.g. any phrase with the word "identify", etc.

Unicode Regular Expressions

When using the Block New Phrases by Pattern setting, you can select the Unicode Regex option. This will automatically add the Unicode modifier /u to the end of the regex.

This allows you to support blocking source phrases based on Unicode characters used in specific languages.

Here are several examples:

Block Phrases With...	Regular Expression	Examples
Arabic	`[\p{sc=Arabic}]`	حظر هذه العبارة.
Cyrillic characters (e.g. Ukrainian, Bulgarian, etc.)	`[\p{sc=Cyrillic}]`	Заблокуйте цю фразу.
Chinese	`[\p{sc=Han}]`	屏蔽这句话。
Korean	`[\p{sc=Hangul}]`	이 문구를 차단하세요.
Japanese	`[\p{sc=Hiragana}]`	このフレーズをブロックします。
Japanese	`[\p{sc=Katakana}]`	このフレーズをブロックします。
Brahmic script (e.g. Tai Dam, Tai Don, Thai Son, and Vietnamese)	`[\p{sc=Tai_Viet}]`	Chặn cụm từ này.
Thai	`[\p{sc=Thai}]`	บล็อควลีนี้

For other character sets/languages, you can add the ISO English Name of the language you want to block into a regex pattern. You can find the language code names of scripts here: ISO 15924 Alphabetical Code List