Java Regular Expressions (regex) is a powerful tool for searching, matching, and manipulating text. It is a pattern-matching language that allows you to define a pattern and search for it in a string. Java regex is widely used in web development, data processing, and text analysis.
The syntax of Java regex is based on the Perl programming language. It uses a set of special characters and operators to define patterns. For example, the dot (.) character matches any single character, while the asterisk (*) matches zero or more occurrences of the preceding character.
Java regex provides a wide range of features, including character classes, quantifiers, anchors, and groups. Character classes allow you to match a set of characters, such as digits, letters, or special characters. Quantifiers allow you to specify the number of occurrences of a pattern, such as one or more, zero or one, or a specific number. Anchors allow you to match the beginning or end of a string, or a word boundary. Groups allow you to group patterns together and apply operators to them.
Java regex is implemented in the java.util.regex package, which provides a set of classes and methods for working with regular expressions. The most commonly used classes are Pattern and Matcher. Pattern represents a compiled regular expression, while Matcher is used to match the pattern against a string.
This cheat sheet provides an overview of the most commonly used regex syntax in Java.
Basic Syntax
Syntax | Description |
---|---|
. | Matches any single character except newline |
^ | Matches the beginning of a line |
$ | Matches the end of a line |
[] | Matches any single character within the brackets |
[^] | Matches any single character not within the brackets |
| | Matches either the expression before or after the | |
() | Groups expressions together |
Character Classes
Syntax | Description |
---|---|
\d | Matches any digit |
\D | Matches any non-digit |
\s | Matches any whitespace character |
\S | Matches any non-whitespace character |
\w | Matches any word character (letter, digit, or underscore) |
\W | Matches any non-word character |
Quantifiers
Syntax | Description |
---|---|
* | Matches zero or more occurrences of the preceding expression |
+ | Matches one or more occurrences of the preceding expression |
? | Matches zero or one occurrence of the preceding expression |
{n} | Matches exactly n occurrences of the preceding expression |
{n,} | Matches n or more occurrences of the preceding expression |
{n,m} | Matches between n and m occurrences of the preceding expression |
Anchors
Syntax | Description |
---|---|
\b | Matches a word boundary |
\B | Matches a non-word boundary |
(?=...) | Positive lookahead |
(?!...) | Negative lookahead |
(?<=...) | Positive lookbehind |
(?<!...) | Negative lookbehind |
Examples
Regex | Description |
---|---|
\d{3}-\d{2}-\d{4} | Matches a social security number in the format ###-##-#### |
^[A-Z][a-z]*$ | Matches a string that starts with an uppercase letter followed by zero or more lowercase letters |
(\d{3})\d{3}-\d{4} | Groups the first three digits of a phone number |