Regular Expressions
Regular Expressions can be used in various features in your Localize dashboard.
A number of features in your Localize dashboard allow you to enter a Regular Expression or "regex", including:
- Disable Localize by Page
- Disable Phrase Detection by Page
- Define Variables in Your Dynamic Phrases
- Block New Phrases by Pattern
Examples of Regular Expressions
The following are examples of some common Regular Expressions that you might find in your content.
Item | Regular Expression | Examples |
---|---|---|
Date With Slashes | \d{1,2}\/\d{1,2}\/\d{4} | 1/2/2020 12/15/1963 |
Date with Dashes and text-based 3-letter month | \d{1,2}-[A-Z]{1}[a-z]{2}-\d{4} | 11-Jan-2014 5-Apr-1963 |
Date and time | [a-z,A-Z]{3} \d{1,2}, \d{4} \d{1,2}:\d{2} [A,a,P,p][M,m] | Aug 9, 2021 10:20 AM Sep 11, 1964 10:20 pm |
Short time | \d{1,2}:\d{2} [A,a,P,p][M,m] | 12:34 PM 5:14 am |
Countdown Timer (HH:MM:SS) | \d{2}:\d{2}:\d{2} | 12:42:22 01:18:33 |
Brand name or other specific word | Localize | Any phrase whose content contains the string literal "Localize" anywhere in it |
Specific ID #s (e.g. US social security #) | \d{3}:\d{2}:\d{3} | 111-22-333 |
Specific word: dog | ^dog$ | Any phrase whose content is exactly "dog" |
Special Characters in Regular Expressions
Special Characters
Regular expressions use special characters (metacharacters) to denote which characters to match in the source content. You must be careful when adding words/phrases to this list so that they aren't interpreted differently than you expect them to be.
The special characters are:
- backslash \
- caret ^
- dollar sign $
- period or dot .
- vertical bar or pipe symbol |
- question mark ?
- asterisk or star *
- plus sign +
- opening parenthesis (
- closing parenthesis )
- opening square bracket [
- opening curly brace {
Escaping
Any special characters that can be used in regular expressions will need to be "escaped" if you want to match the literal character.
For example: If you want to exclude anything with .tif in the phrase, you would enter: \.tif
- The backslash tells the parser to consider the "." as a literal character.
- Without the backslash, the .tif would include any phrase that has at least one character before tif, e.g. any phrase with the word "identify", etc.
Unicode Regular Expressions
When using the Block New Phrases by Pattern setting, you can select the Unicode Regex option. This will automatically add the Unicode modifier /u
to the end of the regex.
This allows you to support blocking source phrases based on Unicode characters used in specific languages.
Here are several examples:
Block Phrases With... | Regular Expression | Examples |
---|---|---|
Arabic | [\p{sc=Arabic}] | حظر هذه العبارة. |
Cyrillic characters (e.g. Ukrainian, Bulgarian, etc.) | [\p{sc=Cyrillic}] | Заблокуйте цю фразу. |
Chinese | [\p{sc=Han}] | 屏蔽这句话。 |
Korean | [\p{sc=Hangul}] | 이 문구를 차단하세요. |
Japanese | [\p{sc=Hiragana}] | このフレーズをブロックします。 |
Japanese | [\p{sc=Katakana}] | このフレーズをブロックします。 |
Brahmic script (e.g. Tai Dam, Tai Don, Thai Son, and Vietnamese) | [\p{sc=Tai_Viet}] | Chặn cụm từ này. |
Thai | [\p{sc=Thai}] | บล็อควลีนี้ |
For other character sets/languages, you can add the ISO English Name of the language you want to block into a regex pattern. You can find the language code names of scripts here: ISO 15924 Alphabetical Code List
Updated 3 months ago