The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by rahmadhany.triastanto, 2017-01-05 23:15:36

Regular Expression Basic Syntax Reference

Regular Expression



Basic Syntax Reference




Characters
Character Description Example
Any character except All characters except the listed special characters match a a matches a
[\^$.|?*+() single instance of themselves. { and } are literal characters,
unless they're part of a valid regular expression token (e.g.
the {n} quantifier).
\ (backslash) followed A backslash escapes special characters to suppress their \+ matches +
by any of special meaning.
[\^$.|?*+(){}
\Q...\E Matches the characters between \Q and \E literally, \Q+-*/\E
suppressing the meaning of special characters. matches +-*/
\xFF where FF are 2 Matches the character with the specified ASCII/ANSI value, \xA9 matches ©
hexadecimal digits which depends on the code page used. Can be used in when using the
character classes. Latin-1 code page.
\n, \r and \t Match an LF character, CR character and a tab character \r\n matches a
respectively. Can be used in character classes. DOS/Windows
CRLF line break.
\a, \e, \f and \v Match a bell character (\x07), escape character (\x1B),
form feed (\x0C) and vertical tab (\x0B) respectively. Can
be used in character classes.
\cA through \cZ Match an ASCII character Control+A through Control+Z, \cM\cJ matches
equivalent to \x01 through \x1A. Can be used in character a DOS/Windows
classes. CRLF line break.
Character Classes or Character Sets [abc]
Character Description Example

[ (opening square Starts a character class. A character class matches a single
bracket) character out of all the possibilities offered by the character
class. Inside a character class, different rules apply. The
rules in this section are only valid inside character classes.
The rules outside this section are not valid in character
classes, except for a few character escapes that are indicated
with "can be used inside character classes".
Any character except All characters except the listed special characters. [abc] matches a,
^-]\ add that
1

character to the b or c
possible matches for
the character class.
\ (backslash) followed A backslash escapes special characters to suppress their [\^\]] matches
by any of ^-]\ special meaning. ^ or ]
- (hyphen) except Specifies a range of characters. (Specifies a hyphen if [a-zA-Z0-9]
immediately after the placed immediately after the opening [) matches any letter
opening [ or digit
^ (caret) immediately Negates the character class, causing it to match a single [^a-d] matches
after the opening [ character not listed in the character class. (Specifies a caret x (any character
if placed anywhere except after the opening [) except a, b, c or d)
\d, \w and \s Shorthand character classes matching digits, word [\d\s] matches
characters (letters, digits, and underscores), and whitespace a character that is a
(spaces, tabs, and line breaks). Can be used inside and digit or whitespace
outside character classes.
\D, \W and \S Negated versions of the above. Should be used only outside \D matches a
character classes. (Can be used inside, but that is confusing.) character that is
not a digit
[\b] Inside a character class, \b is a backspace character. [\b\t] matches
a backspace or tab
character

Dot
Character Description Example
. (dot) Matches any single character except line break characters \r . matches x or
and \n. Most regex flavors have an option to make the dot (almost) any other
match line break characters too. character

Anchors
Character Description Example

^ (caret) Matches at the start of the string the regex pattern is applied ^. matches a in
to. Matches a position rather than a character. Most regex abc\ndef. Also
flavors have an option to make the caret match after line matches d in
breaks (i.e. at the start of a line in a file) as well. "multi-line" mode.
$ (dollar) Matches at the end of the string the regex pattern is applied .$ matches f in
to. Matches a position rather than a character. Most regex abc\ndef. Also
flavors have an option to make the dollar match before line matches c in
breaks (i.e. at the end of a line in a file) as well. Also "multi-line" mode.
matches before the very last line break if the string ends
with a line break.
\A Matches at the start of the string the regex pattern is applied \A. matches a in
to. Matches a position rather than a character. Never abc
matches after line breaks.
2

\Z Matches at the end of the string the regex pattern is applied .\Z matches f in
to. Matches a position rather than a character. Never abc\ndef
matches before line breaks, except for the very last line
break if the string ends with a line break.
\z Matches at the end of the string the regex pattern is applied .\z matches f in
to. Matches a position rather than a character. Never abc\ndef
matches before line breaks.

Word Boundaries
Character Description Example
\b Matches at the position between a word character (anything .\b matches c in
matched by \w) and a non-word character (anything abc
matched by [^\w] or \W) as well as at the start and/or end
of the string if the first and/or last characters in the string are
word characters.
\B Matches at the position between two word characters (i.e the \B.\B matches b
position between \w\w) as well as at the position between in abc
two non-word characters (i.e. \W\W).

Alternation
Character Description Example
| (pipe) Causes the regex engine to match either the part on the left abc|def|xyz
side, or the part on the right side. Can be strung together matches abc, def
into a series of options. or xyz

| (pipe) The pipe has the lowest precedence of all operators. Use abc(def|xyz)
grouping to alternate only part of the regular expression. matches abcdef
or abcxyz

Quantifiers
Character Description Example
? (question mark) Makes the preceding item optional. Greedy, so the optional abc? matches ab
item is included in the match if possible. or abc
?? Makes the preceding item optional. Lazy, so the optional abc?? matches
item is excluded in the match if possible. This construct is ab or abc
often excluded from documentation because of its limited
use.
* (star) Repeats the previous item zero or more times. Greedy, so as ".*" matches
many items as possible will be matched before trying "def" "ghi" in
permutations with less matches of the preceding item, up to abc "def"
the point where the preceding item is not matched at all. "ghi" jkl
*? (lazy star) Repeats the previous item zero or more times. Lazy, so the ".*?" matches
engine first attempts to skip the previous item, before trying "def" in abc
permutations with ever increasing matches of the preceding
3

item. "def" "ghi"
jkl
+ (plus) Repeats the previous item once or more. Greedy, so as many ".+" matches
items as possible will be matched before trying "def" "ghi" in
permutations with less matches of the preceding item, up to abc "def"
the point where the preceding item is matched only once. "ghi" jkl
+? (lazy plus) Repeats the previous item once or more. Lazy, so the engine ".+?" matches
first matches the previous item only once, before trying "def" in abc
permutations with ever increasing matches of the preceding "def" "ghi"
item. jkl
{n} where n is an Repeats the previous item exactly n times. a{3} matches
integer >= 1 aaa
{n,m} where n >= 0 Repeats the previous item between n and m times. Greedy, a{2,4} matches
and m >= n so repeating m times is tried before reducing the repetition aaaa, aaa or aa
to n times.
{n,m}? where n >= 0 Repeats the previous item between n and m times. Lazy, so a{2,4}? matches
and m >= n repeating n times is tried before increasing the repetition to aa, aaa or aaaa
m times.
{n,} where n >= 0 Repeats the previous item at least n times. Greedy, so as a{2,} matches
many items as possible will be matched before trying aaaaa in aaaaa
permutations with less matches of the preceding item, up to
the point where the preceding item is matched only n times.

{n,}? where n >= 0 Repeats the previous item n or more times. Lazy, so the a{2,}? matches
engine first matches the previous item n times, before trying aa in aaaaa
permutations with ever increasing matches of the preceding
item.
Sample Patterns

Description Pattern
Letters, numbers and hyphens ([A-Za-z0-9-]+)
Date (e.g. 21/3/2006) (\d{1,2}\/\d{1,2}\/\d{4})
jpg, gif or png image ([^\s]+(?=\.(jpg|gif|png))\.\2)
Any number from 1 to 50 inclusive (^[1-9]{1}$|^[1-4]{1}[0-9]{1}$|^50$)
Valid hexadecimal colour code (#?([A-Fa-f0-9]){3}(([A-Fa-f0-9]){3})?)
8 to 15 character string with at least one upper ((?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15})
case letter , one lower case letter , and one digit
(useful for passwords).
Email addresses (\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6})
HTML Tags (\<(/?[^\>]+)\>)


4


Click to View FlipBook Version