VBScript Excel Macro for Parsing URL Strings
This VBScript (Microsoft Excel macro) function is based on part of a script that organizes a list of URLs.
The script takes a list of URLs and extracts the top-level and second-level domains using regular expression matching. For example, it extracts microsoft.com
out of a URL such as msdn.microsoft.com/en-us/aa570309.aspx
.
The company this was written for had a set of URLs gathered from an outside data source indicating the most-trafficked websites in their industry and formatted as an ordered range of cells in Excel. They needed to compare this set of URLs against a list of partner URLs to visualize their partner coverage in different geographical areas.
To do this comparison, the URLs needed to be standardized to show only the top-level and second-level domains. The final step (which is not shown here) was to do an array-based comparison and graphically show their coverage.
VBScript supports regular expression matching, which makes this URL parsing straightforward:
'Truncate URLs to show top and second-level domains only, e.g. "microsoft.com"
Function ExtractDomain(Str As String) As String
'Remove everything after the first slash
Str = ParseURL(Str, "/(.+)", "")
'Remove trailing slashes
Str = ParseURL(Str, "/$", "")
'Extract top and second-level domains
Str = ParseURL(Str, "[a-zA-Z0-9._%+-]+\.([a-zA-Z0-9._%+-]+\.)", "$1")
'Make lowercase
ExtractDomain = LCase(Str)
End Function
'Search for the a regular expression in a string, and replace the regular expression
Function ParseURL(Source As String, _
MatchString As String, ReplacementString As String) As String
Option Explicit
#Const LateBind = True
#If Not LateBind Then
Dim re As RegExp
Set re = New VBScript_RegExp_55.RegExp
#Else
Dim re As Object
Set re = CreateObject("vbscript.regexp")
#End If
With re
.MultiLine = False
.Global = True
.Pattern = MatchString
End With
ParseURL = re.Replace(Source, ReplacementString)
End Function
Screenshot
Template contains a VBScript macro that parses URLs to show only top-level and second level domains. The contents of column B were automatically extracted from column A by the macro.
Download Link
Excel Template Containing VBScript
(Works with Excel 97-2003 and higher)