Java Regular Expression (Java Regex) with Examples
Get Job-ready: Java Course with 45+ Real-time Projects! - Learn Java
Java Regular expressions – It may sound confusing and difficult at first, but we will boil it down for you. In order to understand the concept of regular expressions, you need to understand the need for it. Consider a real-life example, where you need to remember a code for a gift.
You agree to do so because you believe that you have a good memory. However, you are shocked when you see the code being around 200 characters long! You immediately realize that you are in trouble because it is impossible for you to remember that. (Because not all of us are blessed with Sheldon’s memory!). So you return from where you started, disappointed, because you know you will not be able to remember it.
Luckily, one guy asks you to say what it looked like, what its first and last characters were, and so on. This becomes easy as you do remember it! So you tell him and he figures out the rest of the code with the help of his Java program! What you said was a regular expression of the code. What it begins with, what it ended with, what characters did it have, etc. Let us learn more about Regular expressions in Java with Examples.
Regular Expressions in Java
Regular expressions in Java comprise of a series of characters that define a particular search pattern! These are extensively used in Find and Replace algorithms and search engines where it filters out the strings based on a particular pattern.
Famous American Mathematician Stephen Cole Kleene brought out the concept of regular expressions in the 1950s. The java.util.regex class in Java is extensively used for regular expressions.
It has the following interfaces and classes:
- MatchResult interface
- Matcher class
- Pattern class
- PatternSyntaxException class
1. MatchResult Interface in Java
This interface determines the result of a match operation for a regular expression. You can see the group and the match boundaries but you cannot modify the result through the MatchResult interface.
This interface has the following methods:
a. end()
This method returns the index of the last character matched to the regular expression.
However, a modified version of this method enables us to specify a group whose last character matched. It then returns the index of the last match.
This is particularly useful for returning the group’s last matched character index. It has a syntax:
int end() or int end(int group);
b. start()
Contrary to the end() method, the start() method returns the starting offset of the match to the pattern specified. This method has a modified version which allows us to specify groups who match with the subsequence given. The syntax of the start() method is as follows
int start() or int start(int group)
c. group()
This method returns the Character subsequence returned by the last match. There is a modified version of the method which allows us to specify the group which matched with the subsequence during the previous match operation. The syntax of the group method is:
String group() or String group(int group);
d. groupCount()
It returns the number of groups that match the given pattern. These counts are actually the number of times the pattern was similar to the character subsequence. The syntax is:
int groupCount()
2. Matcher class in Java
This class implements the MatchResult interface. As the name suggests this class has methods that help in performing match operations on character sequences. It has various methods namely:
a. boolean matches()
This method checks whether the particular pattern matches the regular expression passed through it. It matches the text with the regex. It also creates a Matcher Instance. The syntax of this method is:
boolean matches(String regex)
b. find()
This method finds the next expression that matches the pattern. It is useful when we are searching for multiple occurrences. It goes through the entire string and returns true if the regex is present in the character sequence and false otherwise. The syntax of this method is:
boolean find()
c. group()
This method returns the input sequences which match the previous match result. Simply put, this method returns the String value which subsequence matching the previous result. It has the syntax:
String group();
d. start()
This returns the starting index of the subsequence which has matched. This has the syntax:
int start();
e. end()
This returns the ending index of the subsequence that has a match. It has the syntax:
int end();
Both of these methods, start() and end() methods, combined with the find() method returns the starting and ending indexes of the match found by the find() method.
f. groupCount()
This method is particularly useful for returning the total number of subsequences that have matched. It has the syntax:
int groupCount();
Java program to illustrate the use of Matcher class:
import java.util.regex. * ; public class PatternJavaMatcherClass { public static void main(String args[]) { // Let's define a pattern to search from. Pattern pattern = Pattern.compile("Da*"); // Search above pattern in "Data-Flair.training" Matcher m = pattern.matcher("Data-Flair.training"); // Printing the starting and ending indexes of the pattern // in text System.out.println("Searching for pattern " + pattern + " in Data-Flair.training"); while (m.find()) System.out.println("Pattern found from " + m.start() + " to " + (m.end() - 1)); } }
Output
Pattern found from 0 to 1
3. Pattern Class in Java
Previously we learned about matcher class which is used for checking for matches of the sequence provided in the given character sequence. The pattern class defines patterns for the regex engine to work on. It is a compilation of regular expressions that define various types of patterns. There are no public constructors in this class. A compile method converts the regular expression to a pattern as elaborated below.
It has the following methods.
a. compile
This returns the instance of the Pattern after compiling the regular expression. However, a modified version of this method allows us to include flags that, when paired up with the regular expression, convert it to a pattern. It has a syntax:
static Pattern.compile(String regex) or static Pattern(String regex, int flags)
b. matcher
This method has the job of matching the sequence with the pattern compiled. It has the syntax:
Matcher matcher(CharSequence input)
c. matches
This method does both of the jobs of compiling the regex and matching it to the character sequence together. It compiles the regex and matches it to the compiled pattern. It has the syntax:
static boolean matches(String regex, CharSequence input);
d. split()
The split method splits the input sequence, i.e, the character sequence around the matches of its pattern. This method has a modified version with one extra argument which is the limit specified. The syntax of this method is:
String[] split(CharSequence)
or
String[] split(CharSequence,int limit)
e. pattern()
The pattern method returns the regular expression from the pattern compiled by the compile method. The syntax of the method is as follows:
String pattern()
e. quote()
This returns a String pattern literal for the specified string which is one of the parameters passed to the method. It produces a string equivalent for later use as a pattern. The syntax of this method is:
static String quote(String variable);
f. toString()
This method has the function of returning the string rendition of the pattern. This makes it easy to perform string manipulations on the pattern because it gets converted to a String object. It has the syntax:
String toString();
Java program to illustrate the Java Pattern class:
package com.dataflair.regexjava; import java.util.regex. * ; public class PatternJava { public static void main(String[] args) { //We define the regular expression first. //For that we shall use Pattern object and compile pattern. System.out.println("The first method of using regex"); Pattern p = Pattern.compile(".ataFlair"); // the dot(.) represents a single character Matcher m = p.matcher("DataFlair"); boolean isMatch = m.matches(); System.out.println(isMatch); //This is the first way of creating a regex program //the second way of using Regex in Java is System.out.println("The second method of using regex"); boolean isMatch2 = Pattern.compile(".ataFlair").matcher("DataFlair").matches(); System.out.println(isMatch2); //the third and the simplest way of creating regex System.out.println("The third method of using regex"); boolean isMatch3 = Pattern.matches(".ataFlair", "DataFlair"); System.out.println(isMatch3); //These are the basic ways of regex in Java } }
Output
true
The second method of using regex
true
The third method of using regex
true
The output of the matches method depends on the pattern matching the character stream passed through it. The dot represents a single character. That is why.ataFlair matches with DataFlair because the single character (.) matches with the character D. Any character in place of D would have resulted in true. However, any change in length of the sequence or presence of any other characters other than specified characters would result in a false output.
4. Pattern Syntax Exception Class in Java
Errors may occur when evaluating a regular expression. This object of the regex class has the job of identifying unchecked exceptions. Some of the methods are:
a. getDescription()
The sole function of this method is to return the description of the error encountered while executing the program. It has the syntax:
String getDescription()
b. getIndex()
This method returns the index at which the error has occurred. Particularly useful when there is a need for manipulating the strings in that particular position. The syntax is:
int getIndex()
c. getMessage()
This method returns the entire error message containing the description of the error, the pattern of the regular expression(if it has errors), and a visual indication of the error-index within the pattern. The syntax of the following is:
String getMessage()
d. getPattern()
This method returns the erroneous pattern of the regex while passing through the character sequence. This method has a syntax like:
String getPattern()
Java program to illustrate the use of PatternSyntax Exception Class in Java:
package com.dataflar.regexjava; import java.util.regex.Matcher; import java.util.regex.Pattern; import java.util.regex.PatternSyntaxException; public class BasicMethod { private static String REGEX = "["; private static String INPUT = "DataFlair is a great palce to learn " + "DataFlair has a wide range of articles!."; private static String REPLACE = "Java"; public static void main(String[] args) { try { Pattern pattern = Pattern.compile(REGEX); // get a matcher object Matcher matcher = pattern.matcher(INPUT); INPUT = matcher.replaceAll(REPLACE); } catch(PatternSyntaxException e) { System.out.println("PatternSyntaxException: "); System.out.println("Description: " + e.getDescription()); System.out.println("Index: " + e.getIndex()); System.out.println("Message: " + e.getMessage()); System.out.println("Pattern: " + e.getPattern()); } } }
Output
Description: Unclosed character class
Index: 0
Message: Unclosed character class near index 0
[
^
Pattern: [
Character Class in Java
Before we progress further into the article, it is essential we know about the concept of character class in Java. With the help of character classes, we can search for specific characters. These classes ask the compiler to search for a specific set of characters as mentioned by a programmer.
For example, if you have to search for a character which can be any one of h,k, i, you can simply enclose these three characters within box brackets. This will tell the compiler to search for h,k, or i in the character argument. If the characters are present, then the method will return true. If not it returns false. One can add additional rules based on the requirements of the program.
There are certain characters that you need to be familiar with:
1. (^) – This token negates the values i.e complement of the values.
2. (-) – This token denotes a range of characters.
3. (&&) –Â used to attach additional character classes to a class.
Let us look at some examples of the regex character classes in Java.
a. [abc] – This represents a simple class of a sequence carrying a,b or c
b. [^abc] – This represents any character except a.b or c.
c. [a-zA-Z] – This represents any character from a-z or A-Z.
d. [a-c[w-z]] – This represents characters from a-c union w-z.
e. [a-x&&[def]] – This represents characters d,e or f.This is an example of intersection
f. [a-z&&[^de]] – This represents all characters from a to z except d and e.
g. [a-z&&[^m-p]] – This represents the characters from a-z and none of the characters from m to p.
Java program to illustrate the use of Character class in Java:
package com.dataflair.regexjava; import java.util.regex. * ; public class CharacterClassRegex { public static void main(String[] args) { boolean isMatch = Pattern.matches("[a-z]", "a"); //true because a lies in the mentioned limit System.out.println(isMatch); isMatch = Pattern.matches("[a-zA-Z0-9]", "D"); //true because D lies in the mentioned conditions System.out.println(isMatch); isMatch = Pattern.matches("[a-zA-Z&&[^pqn]]", "p"); //false because the expression excludes p. . System.out.println(isMatch); } }
Output
true
false
Java Regex Quantifiers
If you have to specify the number of character occurrences too then you can use quantifiers to add additional conditions on the characters.
Some of the quantifiers are:
- A?– Checks if A occurs once or none at all.
- A+ – Checks if A occurs one or more times.
- A* – Checks if A occurs zero or more times.
- A{n}– Checks if A occurs exactly n times.
- A{n, – Checks if A occurs n times or more.
- A{m,n} – Checks if A occurs at least m times but less than n times.
Java program to illustrate the usage of Quantifiers:
package com.dataflair.regexjava; import java.util.regex. * ; public class PatternJavaQuantRegex { public static void main(String[] args) { boolean isMatch = Pattern.matches("[jav]?", "java"); //false because j,a,v must be present only once. System.out.println(isMatch); isMatch = Pattern.matches("[jav]+", "java"); //true because each and every character has occurred once or more than once System.out.println(isMatch); isMatch = Pattern.matches("[jav]*", "javaaa"); //true because the letters j,v,a has occurred zero or more times. System.out.println(isMatch); } }
Output
true
true
Java Replacement Methods
These methods have the function of replacing text inside an input string. In regex programming, they are very important. Some of these methods are:
1. appendReplacement()
This replaces the compiled character or word with the given input and then appends or adds the replacement at the end of the string buffer. This has the syntax:
public Matcher appendReplacement(String Buffer ob, String replacement)
2. replaceAll()
This method replaces all the character subsequences that match the given sequence with the given string as a replacement. Its syntax is:
public String replaceAll(String replacement)
3. replaceFirst()
This method, unlike the replaceAll method, replaces the first subsequence of the character sequence provided as the input, which matches with the compiled pattern, with the given replacement string. This has a syntax:
public String replaceFirst(String replacestring)
4. quoteReplacement()
The method quoteReplacement, as the name suggests, returns a replacement string which can be a literal replacement for the appendReplacement method. The syntax of this method is:
public static String quoteReplacement(String str);
Java program to illustrate the use of replacement methods in java:
package com.dataflair.regexjava; import java.util.regex. * ; public class ReplaceMethods { public static void main(String args[]) { String orgstr = new String("the quick brown fox jumped over the lazy dog"); System.out.println("Original String is ': " + orgstr); System.out.println("String after replacing 'fox' with 'dog': " + orgstr.replace("fox", "dog")); System.out.println("String after replacing all 't' with 'a': " + orgstr.replace('t', 'a')); String sent = "DataFlair is a great place to go for learning"; //remove white spaces from the sentence String str2 = sent.replaceAll("\\s", ""); System.out.println(str2); sent = "This website also provides free tutorials on Java,Python and even Machine Learning! "; //Only Replace first 's' with 'r' in the entire sentence. String str1 = sent.replaceFirst("s", "r"); System.out.println(str1); } }
Output
String after replacing ‘fox’ with ‘dog’: the quick brown dog jumped over the lazy dog
String after replacing all ‘t’ with ‘a’: ahe quick brown fox jumped over ahe lazy dog
DataFlairisagreatplacetogoforlearning
Thir website also provides free tutorials on Java, Python, and even Machine Learning!
Metacharacters in Java
These are the metacharacters that act as shortcodes. They represent a special feature while specifying regular expressions. Each of them is equally important and pairing them up with character classes can create very effective regular expressions. These represent a variety of information as we will be seeing now:
1. Dot(.)- This represents any character that may or may not match terminator.
2. \d – This represents digits i.e, [0-9]
3. \D – Non-digits which is short for [^0-9]
4. \s- This metacharacter represents any whitespace character, i.e, \t,\n,\r etc
5. \S- This represents any non-whitespace character, i.e, [^\s]
6. \w- This represents word character i.e [a-zA-Z_0-9]
7. \W- This represents non-word characters i.e, [^/w]
8. \b – This represents a word boundary.
9. \B- This represents a non-word boundary
Java program to illustrate the use of Java metacharacters:
package com.dataflair.regexjava; import java.util.regex. * ; public class MetaCharacterRegex { public static void main(String[] args) { System.out.println(Pattern.matches("\\d", "dataflair")); //false (non-digit) System.out.println(Pattern.matches("\\d", "1")); //true (digit and comes once) System.out.println(Pattern.matches("\\d", "78643")); //false (digit but comes more than once) System.out.println(Pattern.matches("\\d", "3723ytec")); //false (digit and char) System.out.println(Pattern.matches("\\D", "abc")); //false (non-digit but comes more than once) System.out.println(Pattern.matches("\\D", "9")); //false (digit) System.out.println(Pattern.matches("\\D", "12873")); //false (digit) System.out.println(Pattern.matches("\\D", "390()3@bc")); //false (digit and char) System.out.println(Pattern.matches("\\D", "(")); //true (non-digit and comes once) } }
Output
true
false
false
false
false
false
false
true
Example of Regular expressions in JavaÂ
package com.dataflair.regexjava; import java.util.regex. * ; public class JavaRegexExample { public static void main(String args[]) { System.out.println("We will evaluate phone numbers in this program"); System.out.println("Each phone number should be of 10 digits and should start with 4,5,6"); System.out.println("We will take the help of character class moderators. "); System.out.println("Is 5953038949 a valid number? " + Pattern.matches("[456]{1}[0-9]{9}", "5953038949")); //true System.out.println("Is 59537657755949 a valid number? " + Pattern.matches("[456][0-9]{9}", "59537657755949")); //false because more than 10 characters System.out.println("Is 99530112290 a valid number? " + Pattern.matches("[456][0-9]{9}", "99530112290")); //false (11 characters and starts with 9) System.out.println("Is 6953038949 a valid number? " + Pattern.matches("[456][0-9]{9}", "6953038949")); //true (starts with 6) System.out.println("Is 4333038949 a valid number? " + Pattern.matches("[456][0-9]{9}", "4333038949") + "\n"); //true (starts with 4) System.out.println("With the help of metacharacters!! \n"); System.out.println("Is 6353038949 a valid number? " + Pattern.matches("[456]{1}\\d{9}", "6353038949")); //true System.out.println("Is 3853038949 a valid number? " + Pattern.matches("[456]{1}\\d{9}", "3853038949")); //false (starts from 3) System.out.println("The metacharacters help in identifying only digits i.e, [0-9]"); } }
Output
Each phone number should be of 10 digits and should start with 4,5,6
We will take the help of character class moderators.
Is 5953038949 a valid number? true
Is 59537657755949 a valid number? false
Is 99530112290 a valid number? false
Is 6953038949 a valid number? true
Is 4333038949 a valid number? trueWith the help of metacharacters!!Is 6353038949 a valid number? true
Is 3853038949 a valid number? false
The metacharacters help in identifying only digits i.e, [0-9]
Java Facts and Observations
a. compile() is a static method in the Pattern Class.
b. We do not create a constructor of the compile() method. Instead, we create an object of type pattern while calling the compile()method.
c. We also create an object of type matcher by calling the matcher() object.
d. This method is also a static method in itself and checks whether the pattern provided matches with the text.
e. When we want to find multiple occurrences of a pattern in a text sequence, we use the find() method.
f. The split() method splits a text based on a delimiter pattern in the text. ‘
Summary
In this article, we learned about the various methods and classes that, together, build the java.util.regex class. Regular Expressions are frequently used in search engines and distributed systems. Hence a strong concept of the same is extremely important for future developers.
Your opinion matters
Please write your valuable feedback about DataFlair on Google