Regular Expressions

These are a powerful language for describing and manipulating text.
Regular expressions is also called pattern matching which involves comparing one string to another or comparing a series of wildcards that represent a type of string to a literal string.
A regular expression is applied to a string.
Regular Expressions are a common method used to parse text files as well as searching and replacing for substrings.
Regular expressions are text patterns that are used for string matching.


The result of applying a regular expression to a string is either to return a new string representing a modification of some part of the original string.
Remember that strings are immutable and so cannot be changed by the regular expression.


A regular expression consists of two types of characters:

  • literals - the characters you want to match in the target string

  • meta-characters - special symbols that act as a command to the regular expression parser.

For example the regular expression
^(From|To|Subject|Date):
This will match any substring as long as those letters start with a new line (^) and end with a colon


Using regular expressions removes the need to use the Visual Basic LIKE operator as well as the Replace string function.
It is important to remember that the String.Replace method only supports replacement of a single character.


Examples

[0-9]numeric value
[^0-9]non numeric value
[a-z]all lowercase characters
[A-Z]all uppercase characters
[a-zA-Z0-9]any alphanumeric characters


Character / Meta Character Table

^Will match an expression at the beginning of the string (^A) means the string must start with the character A.
$Will match the end of a string. (A$) means the string must finish with the character A.
|Will match either expression. (a|b) means either the character a or the character b
.Will match any character
*Will match the character on the left of the astrix 0 or more times
+Will match the character on the left of the plus sign once
?Will match the character on the left of the question mark 0 or once
()Parentheses affect the order in which the expression is evaluated
\Allows us to specify the literal characters. (\.) means match the full stop character (\\) means match the back slash character.
\btext boundary
\ttab character
\sany whitespace
\ddecimal digit



Regex Class

This is the .NET framework object oriented approach to regular expressions matching and replacement.
The following namespace is the home to all the .NET framework objects associated with regular expressions:

System.Text.RegularExpressions 

The main class is the Regex class.


include a table of common properties/methods



Splitting text string into an array

Dim regRegex As System.Text.RegularExpressions.Regex 
Dim sTextToSearch As String
Dim sPatterm As String

   sPattern = ","
   sTextToSearch = "one,two,three,four"
   regRegex = New System.Text.RegularExpressions.Regex(sPattern)

   For Each sSubString In regRegex.Split(sTextToSearch)

   Next sSubString

Does a substring exist ?

If System.Text.RegularExpressions.Regex.IsMatch(input, pattern) = True Then 
End If


Removing all the carriage returns from a text string

You do not need to create a regex object

stextstring = System.Text.RegularExpressions.Regex.Replace(stextstring, "\r", "") 


VBA Equivalent


regRegex = New System.Text.RegularExpressions.Regex("\r") 
For Each slineoftext In regRegex.Split(sfilecontents)

Next slineoftext

For Each slineoftext In Str_ToCollection(sfilecontents,",") 

Next slineoftext


RegEx Options

Compiled 
IgnoreCase 
Multiline 
None 
RightToLeft 
SingleLine 

MatchCollection

There are two additional classes in the RegularExpressions namespace that allow you to search a string repeatedly and to return the results in a collection.
The collection returned is of type MatchCollection which consists of zero or more Match objects.



Regex Groups

It is often convenient to group subexpression matches together so that you can parse out pieces of the matching string.
The Group class allows you to create groups of matches based on regular expression syntax and represents the results from a single grouping expression.



CaptureCollection

Each time a Regex object matches a subexpression a Capture instance is created and added to a CaptureCollection collection.
Each group has its own capture collection of the matches for the subexpression assciated with the group.


Extract text in brackets

string TextBetween( 
   string text,
   string start,
   string end)
{
   int istart = text.IndexOf(start);
   istart = (istart == -1) ? 0 : istart + start.length;
   int iend = text.LastIndexOf(end);
   if (iend == -1)
   {
      iend = text.length;
   }
   int ilen = iend - istart;
   return text.Substring(istart, ilen);
}

text.Remove(text.IndexOf(']')).Substring(text.IndexOf('[') + 1);  

Regex.Match(text,@"\(([^)]*)\)").Groups[1].Value 


Important

If you plan to use the same regular expression repeatedly, you should create a RegEx object.


© 2020 Better Solutions Limited. All Rights Reserved. © 2020 Better Solutions Limited TopPrevNext