Regular Expressions
These are a powerful language for describing and manipulating text.
Regular expressions is also called pattern matching which involves comparing one string to another or comparing a series of wildcards that represent a type of string to a literal string.
A regular expression is applied to a string.
Regular Expressions are a common method used to parse text files as well as searching and replacing for substrings.
Regular expressions are text patterns that are used for string matching.
The result of applying a regular expression to a string is either to return a new string representing a modification of some part of the original string.
Remember that strings are immutable and so cannot be changed by the regular expression.
A regular expression consists of two types of characters:
literals - the characters you want to match in the target string
meta-characters - special symbols that act as a command to the regular expression parser.
For example the regular expression
^(From|To|Subject|Date):
This will match any substring as long as those letters start with a new line (^) and end with a colon
Using regular expressions removes the need to use the Visual Basic LIKE operator as well as the Replace string function.
It is important to remember that the String.Replace method only supports replacement of a single character.
Examples
[0-9] | numeric value |
[^0-9] | non numeric value |
[a-z] | all lowercase characters |
[A-Z] | all uppercase characters |
[a-zA-Z0-9] | any alphanumeric characters |
Character / Meta Character Table
^ | Will match an expression at the beginning of the string (^A) means the string must start with the character A. |
$ | Will match the end of a string. (A$) means the string must finish with the character A. |
| | Will match either expression. (a|b) means either the character a or the character b |
. | Will match any character |
* | Will match the character on the left of the astrix 0 or more times |
+ | Will match the character on the left of the plus sign once |
? | Will match the character on the left of the question mark 0 or once |
() | Parentheses affect the order in which the expression is evaluated |
\ | Allows us to specify the literal characters. |
\b | text boundary |
\t | tab character |
\s | any whitespace |
\d | decimal digit |
Examples
\\ | Full Stop |
\\? | Question Mark |
\/ | Forward Slash / |
\\ | Backward Slash \ |
\' | Single Quote |
\" | Double Quote |
Regex Class
This is the .NET framework object oriented approach to regular expressions matching and replacement.
The following namespace is the home to all the .NET framework objects associated with regular expressions:
System.Text.RegularExpressions
The main class is the Regex class.
include a table of common properties/methods
Splitting text string into an array
System.Text.RegularExpressions.Regex regRegex;
string sTextToSearch;
string sPatterm;
sPattern = ",";
sTextToSearch = "one,two,three,four";
regRegex = new System.Text.RegularExpressions.Regex(sPattern);
foreach sSubString in regRegex.Split(sTextToSearch)
next sSubString
Does a substring exist ?
If System.Text.RegularExpressions.Regex.IsMatch(input, pattern) = True Then
End If
Removing all the carriage returns from a text string
You do not need to create a regex object
stextstring = System.Text.RegularExpressions.Regex.Replace(stextstring, "\r", "")
VBA Equivalent
regRegex = New System.Text.RegularExpressions.Regex("\r")
For Each slineoftext In regRegex.Split(sfilecontents)
Next slineoftext
For Each slineoftext In Str_ToCollection(sfilecontents,",")
Next slineoftext
RegEx Options
Compiled | |
IgnoreCase | |
Multiline | |
None | |
RightToLeft | |
SingleLine |
MatchCollection
There are two additional classes in the RegularExpressions namespace that allow you to search a string repeatedly and to return the results in a collection.
The collection returned is of type MatchCollection which consists of zero or more Match objects.
Regex Groups
It is often convenient to group subexpression matches together so that you can parse out pieces of the matching string.
The Group class allows you to create groups of matches based on regular expression syntax and represents the results from a single grouping expression.
CaptureCollection
Each time a Regex object matches a subexpression a Capture instance is created and added to a CaptureCollection collection.
Each group has its own capture collection of the matches for the subexpression assciated with the group.
Extract text in brackets
string TextBetween(
string text,
string start,
string end)
{
int istart = text.IndexOf(start);
istart = (istart == -1) ? 0 : istart + start.length;
int iend = text.LastIndexOf(end);
if (iend == -1)
{
iend = text.length;
}
int ilen = iend - istart;
return text.Substring(istart, ilen);
}
text.Remove(text.IndexOf(']')).Substring(text.IndexOf('[') + 1);
Regex.Match(text,@"\(([^)]*)\)").Groups[1].Value
Important
If you plan to use the same regular expression repeatedly, you should create a RegEx object.
© 2024 Better Solutions Limited. All Rights Reserved. © 2024 Better Solutions Limited TopPrevNext