| Name | Description |
|---|---|
NOT RGEXP |
Negation of REGEXP |
REGEXP |
Pattern matching using regular expressions |
RLIKE |
Synonym for REGEXP |
A regular expression is a powerful way of specifying a pattern for a complex search.
MySQL uses Henry Spencer's implementation of regular
expressions, which is aimed at conformance with POSIX 1003.2.
See Appendix F, Credits. MySQL uses the extended version
to support pattern-matching operations performed with the
REGEXP operator in SQL statements. See
Section 3.3.4.7, “Pattern Matching”, and
Section 12.4.1, “String Comparison Functions”.
This section is a summary, with examples, of the special
characters and constructs that can be used in MySQL for
REGEXP operations. It does not contain all
the details that can be found in Henry Spencer's
regex(7) manual page. That manual page is
included in MySQL source distributions, in the
regex.7 file under the
regex directory.
A regular expression describes a set of strings. The simplest
regular expression is one that has no special characters in it.
For example, the regular expression hello
matches hello and nothing else.
Non-trivial regular expressions use certain special constructs
so that they can match more than one string. For example, the
regular expression hello|word matches either
the string hello or the string
word.
As a more complex example, the regular expression
B[an]*s matches any of the strings
Bananas, Baaaaas,
Bs, and any other string starting with a
B, ending with an s, and
containing any number of a or
n characters in between.
A regular expression for the REGEXP operator
may use any of the following special characters and constructs:
^
Match the beginning of a string.
mysql>SELECT 'fo\nfo' REGEXP '^fo$';-> 0 mysql>SELECT 'fofo' REGEXP '^fo';-> 1
$
Match the end of a string.
mysql>SELECT 'fo\no' REGEXP '^fo\no$';-> 1 mysql>SELECT 'fo\no' REGEXP '^fo$';-> 0
.
Match any character (including carriage return and newline).
mysql>SELECT 'fofo' REGEXP '^f.*$';-> 1 mysql>SELECT 'fo\r\nfo' REGEXP '^f.*$';-> 1
a*
Match any sequence of zero or more a
characters.
mysql>SELECT 'Ban' REGEXP '^Ba*n';-> 1 mysql>SELECT 'Baaan' REGEXP '^Ba*n';-> 1 mysql>SELECT 'Bn' REGEXP '^Ba*n';-> 1
a+
Match any sequence of one or more a
characters.
mysql>SELECT 'Ban' REGEXP '^Ba+n';-> 1 mysql>SELECT 'Bn' REGEXP '^Ba+n';-> 0
a?
Match either zero or one a character.
mysql>SELECT 'Bn' REGEXP '^Ba?n';-> 1 mysql>SELECT 'Ban' REGEXP '^Ba?n';-> 1 mysql>SELECT 'Baan' REGEXP '^Ba?n';-> 0
de|abc
Match either of the sequences de or
abc.
mysql>SELECT 'pi' REGEXP 'pi|apa';-> 1 mysql>SELECT 'axe' REGEXP 'pi|apa';-> 0 mysql>SELECT 'apa' REGEXP 'pi|apa';-> 1 mysql>SELECT 'apa' REGEXP '^(pi|apa)$';-> 1 mysql>SELECT 'pi' REGEXP '^(pi|apa)$';-> 1 mysql>SELECT 'pix' REGEXP '^(pi|apa)$';-> 0
(abc)*
Match zero or more instances of the sequence
abc.
mysql>SELECT 'pi' REGEXP '^(pi)*$';-> 1 mysql>SELECT 'pip' REGEXP '^(pi)*$';-> 0 mysql>SELECT 'pipi' REGEXP '^(pi)*$';-> 1
{1}, {2,3}
{n} or {m,n} notation
provides a more general way of writing regular expressions
that match many occurrences of the previous atom (or
“piece”) of the pattern. m
and n are integers.
a*
Can be written as a{0,}.
a+
Can be written as a{1,}.
a?
Can be written as a{0,1}.
To be more precise, a{n} matches exactly
n instances of a.
a{n,} matches n or
more instances of a.
a{m,n} matches m
through n instances of
a, inclusive.
m and n must be in the
range from 0 to
RE_DUP_MAX (default 255), inclusive. If
both m and n are
given, m must be less than or equal to
n.
mysql>SELECT 'abcde' REGEXP 'a[bcd]{2}e';-> 0 mysql>SELECT 'abcde' REGEXP 'a[bcd]{3}e';-> 1 mysql>SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';-> 1
[a-dX], [^a-dX]
Matches any character that is (or is not, if ^ is used)
either a, b,
c, d or
X. A - character
between two other characters forms a range that matches all
characters from the first character to the second. For
example, [0-9] matches any decimal digit.
To include a literal ] character, it must
immediately follow the opening bracket [.
To include a literal - character, it must
be written first or last. Any character that does not have a
defined special meaning inside a [] pair
matches only itself.
mysql>SELECT 'aXbc' REGEXP '[a-dXYZ]';-> 1 mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]$';-> 0 mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]+$';-> 1 mysql>SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$';-> 0 mysql>SELECT 'gheis' REGEXP '^[^a-dXYZ]+$';-> 1 mysql>SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$';-> 0
[.characters.]
Within a bracket expression (written using
[ and ]), matches the
sequence of characters of that collating element.
characters is either a single character
or a character name like newline. The
following table lists the allowable character names.
The following table shows the allowable character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
| Name | Character | Name | Character |
NUL |
0 |
SOH |
001 |
STX |
002 |
ETX |
003 |
EOT |
004 |
ENQ |
005 |
ACK |
006 |
BEL |
007 |
alert |
007 |
BS |
010 |
backspace |
'\b' |
HT |
011 |
tab |
'\t' |
LF |
012 |
newline |
'\n' |
VT |
013 |
vertical-tab |
'\v' |
FF |
014 |
form-feed |
'\f' |
CR |
015 |
carriage-return |
'\r' |
SO |
016 |
SI |
017 |
DLE |
020 |
DC1 |
021 |
DC2 |
022 |
DC3 |
023 |
DC4 |
024 |
NAK |
025 |
SYN |
026 |
ETB |
027 |
CAN |
030 |
EM |
031 |
SUB |
032 |
ESC |
033 |
IS4 |
034 |
FS |
034 |
IS3 |
035 |
GS |
035 |
IS2 |
036 |
RS |
036 |
IS1 |
037 |
US |
037 |
space |
' ' |
exclamation-mark |
'!' |
quotation-mark |
'"' |
number-sign |
'#' |
dollar-sign |
'$' |
percent-sign |
'%' |
ampersand |
'&' |
apostrophe |
'\'' |
left-parenthesis |
'(' |
right-parenthesis |
')' |
asterisk |
'*' |
plus-sign |
'+' |
comma |
',' |
hyphen |
'-' |
hyphen-minus |
'-' |
period |
'.' |
full-stop |
'.' |
slash |
'/' |
solidus |
'/' |
zero |
'0' |
one |
'1' |
two |
'2' |
three |
'3' |
four |
'4' |
five |
'5' |
six |
'6' |
seven |
'7' |
eight |
'8' |
nine |
'9' |
colon |
':' |
semicolon |
';' |
less-than-sign |
'<' |
equals-sign |
'=' |
greater-than-sign |
'>' |
question-mark |
'?' |
commercial-at |
'@' |
left-square-bracket |
'[' |
backslash |
'\\' |
reverse-solidus |
'\\' |
right-square-bracket |
']' |
circumflex |
'^' |
circumflex-accent |
'^' |
underscore |
'_' |
low-line |
'_' |
grave-accent |
'`' |
left-brace |
'{' |
left-curly-bracket |
'{' |
vertical-line |
'|' |
right-brace |
'}' |
right-curly-bracket |
'}' |
tilde |
'~' |
DEL |
177 |
mysql>SELECT '~' REGEXP '[[.~.]]';-> 1 mysql>SELECT '~' REGEXP '[[.tilde.]]';-> 1
[=character_class=]
Within a bracket expression (written using
[ and ]),
[=character_class=] represents an
equivalence class. It matches all characters with the same
collation value, including itself. For example, if
o and (+) are the
members of an equivalence class, then
[[=o=]], [[=(+)=]],
and [o(+)] are all synonymous. An
equivalence class may not be used as an endpoint of a range.
[:character_class:]
Within a bracket expression (written using
[ and ]),
[:character_class:] represents a
character class that matches all characters belonging to
that class. The following table lists the standard class
names. These names stand for the character classes defined
in the ctype(3) manual page. A particular
locale may provide other class names. A character class may
not be used as an endpoint of a range.
alnum |
Alphanumeric characters |
alpha |
Alphabetic characters |
blank |
Whitespace characters |
cntrl |
Control characters |
digit |
Digit characters |
graph |
Graphic characters |
lower |
Lowercase alphabetic characters |
print |
Graphic or space characters |
punct |
Punctuation characters |
space |
Space, tab, newline, and carriage return |
upper |
Uppercase alphabetic characters |
xdigit |
Hexadecimal digit characters |
mysql>SELECT 'justalnums' REGEXP '[[:alnum:]]+';-> 1 mysql>SELECT '!!' REGEXP '[[:alnum:]]+';-> 0
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the
beginning and end of words, respectively. A word is a
sequence of word characters that is not preceded by or
followed by word characters. A word character is an
alphanumeric character in the alnum class
or an underscore (_).
mysql>SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';-> 1 mysql>SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';-> 0
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The
MySQL parser interprets one of the backslashes, and the regular
expression library interprets the other. For example, to match
the string 1+2 that contains the special
+ character, only the last of the following
regular expressions is the correct one:
mysql>SELECT '1+2' REGEXP '1+2';-> 0 mysql>SELECT '1+2' REGEXP '1\+2';-> 0 mysql>SELECT '1+2' REGEXP '1\\+2';-> 1

User Comments
If you are searching for literal parentheses, you have to enclose each parenthesis in brackets; otherwise, mySQL thinks they're part of the regular expression syntax. For instance:
WHERE phone REGEXP '(435)';
would return any phone numbers that have the sequence 435 in any part of the string, such as "1(801)555-4351". However:
WHERE phone REGEXP '[(]435[)]';
would return only phone numbers with (435), such as "1(435)555-5555".
It's far beyond the scope of this documentation to dwell on all the gory details of regular expressions. Should you have any doubts, please refer to a good book on the subject like "Mastering Regular Expressions" (http://www.oreilly.com/catalog/regex/). References online include http://sitescooper.org/tao_regexps.html and http://www.regexlib.com/
The regular expression support in MySQL seems to be based on traditional-style regex (like ereg() in PHP), not the more sophisticated regular expression matching found in Perl or PHP's preg_match(). And in case the above doesn't make it clear (being mostly SELECT statements using the function directly to return a 1 or 0), you typically would use the REGEXP function in a WHERE clause like this:
SELECT * FROM foo WHERE bar REGEXP "baz"
To match a "special" character such as $, you need to prefix it with the backslash \ character. So \$ matches an actual dollar sign. However, in almost any programming language that claims a "C-like" syntax, that backslash is likely to get picked up on as a special character. So you may need to use an extra backslash. Also, at least in Perl and PHP, the $ is a special character itself, because it indicates that what follows is a variable name - so it will need a backslash too.
In Perl or PHP, you probably will write something like this to match on a line starting with a $ sign:
$query = "SELECT * FROM `foo` WHERE `bar` REGEXP \"^\\\$\""
I'll explain the special characters in that and what they mean:
backslash, speech mark = a literal speech mark
HAT sign = beginning of line
two backslashes = a literal backslash
backslash, dollar = a literal dollar sign
backslash, speech mark = a literal speech mark
Now if you print $query, it will have the value
SELECT * FROM `foo` WHERE `bar` REGEXP "^\$"
which is what you really want, and how you would type it into the mysql command line. Remember also that PHPMyAdmin expects you to put a backslash before a backslash or apostrophe. So in PHPMyAdmin you would enter
SELECT * FROM `foo` WHERE `bar` REGEXP "^\\$"
I guess if you only want to use . and .* regular expressions, you may as well stick to using LIKE with the _ and % wildcards, as that is probably a bit faster. Finally, when using regular expressions in *any* language you need to watch out, because it is very easy to write ones that will always match, and almost as easy to write ones that will never match anything. So do check!
The comment above regarding the need to enclose literal parenthesis in square brackets applies to the literal period also - that is you cannot use \. -- the slash will be ignored, and the period will match anything. Using [.] will work fine.
Michael Webb's comment is wrong; MySQL does not simply "ignore" the slash. Read the manual:
"To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other."
So if you want to match a literal period, put two backslashes in front of it. This will work fine:
SELECT * FROM `bleck` WHERE `mleko` REGEXP '\\.'
It works for parentheses too:
SELECT * FROM `topok` WHERE `fazzle` REGEXP '\\(435\\)'
Regexp's are pretty complicated. If you need anything more complicated than what's shown above, a good site to learn how to use them is http://www.regular-expressions.info. The sites listed in the second comment are probably a good idea too.
Also, don't get in the habit of using character classes to escape metacharacters (like using [(] to match a literal parenthese). The open parentheses '(' and ')' have no special meaning inside a character class, but others like $ retain their meaning. Using [$] won't match the character '$', it'll still match the beginning of the string.
So escape them properly with backslashes.
So what if your PHP code has something that looks like...
mysql_query("select * from a where Name rlike '(^|//////|)example'")
...just to match a string that starts with "example" or contains the string "|example". It looks terrible and it works.
For those of you struggling to escape special characters with long sequences of backslashes (see Vaz's post), I have to ask: why bother? Why not just use the dot-character syntax mentioned on this very page? For example, I needed to find all the ID attributes in some HTML. I tried escaping single and double-quotes for about 30 seconds, then I just switched to this:
SELECT * FROM site WHERE html REGEXP "id=[[.apostrophe.][.quotation-mark.]]archives[[.apostrophe.][.quotation-mark.]]";
Ta da. No escaping issues.
For those of you struggling to escape special characters with long sequences of backslashes (see Vaz's post), I have to ask: why bother? Why not just use the dot-character syntax mentioned on this very page? For example, I needed to find all the ID attributes in some HTML. I tried escaping single and double-quotes for about 30 seconds, then I just switched to this:
SELECT * FROM site WHERE html REGEXP "id=[[.apostrophe.][.quotation-mark.]]archives[[.apostrophe.][.quotation-mark.]]";
Ta da. No escaping issues.
This sql statements:
SELECT 'WORD' REGEXP '[[:upper:]]{4}'; # => 1;
SELECT 'WORD' REGEXP '[[:lower:]]{4}'; # => 0
work right only when collate is _cs and NOT _ci (case insensitive)
created tables eg. the collate 'latin1_swedish_ci' have to be changed, if you want to use case sensitive REGEXPs like [[:upper:]] or [[:lower:]]!
ALTER TABLE <name> CONVERT TO CHARACTER SET latin1 COLLATE latin_general_cs
I set in my.cnf now:
[mysqld]
default-collation= latin1_general_cs
#default was latin1_swedish_ci
The above post by Guido Dieterich (about collation and case sensitivity) is a good point. However, there is a way to match in a case-sensitive manner without having to change the collation of your existing tables: use the "BINARY" keyword.
Here's an extended example (based on the one previously posted):
SELECT
('WORD' REGEXP '[[:upper:]]{4}') AS `upper_match`, # this will be a 1
('WORD' REGEXP '[[:lower:]]{4}') AS `lower_match`, # this will be a 1 on an "*_ci" collation
# -BINARY- matches below
(BINARY 'WORD' REGEXP '[[:upper:]]{4}') AS `bin_upper_match`, # this will be a 1
(BINARY 'WORD' REGEXP '[[:lower:]]{4}') AS `bin_lower_match` # this will be a 0 even on an "*_ci" collation
The query "SELECT * FROM table WHERE text REGEXP 'UPPER'" on a *.ci (e.g. latin1_general_ci) table will find any case insensetive words, even words like "upper" or "uPpOr", or "UpPOr", etc...
To avoid this use one of the following Methods:
SELECT * FROM table WHERE text COLLATE latin1_general_cs REGEXP '...'
OR
SELECT * FROM table WHERE CAST(x AS BINARY) REGEXP '...'
Add your own comment.