Monday, 26 February 2018

regex - Regular Expressions in Excel VBA



I'm using the Microsoft regular expression engine in Excel VBA. I'm very new to regex but I have a pattern working right now. I need to expand it and I'm having trouble. Here is my code so far:



Sub ImportFromDTD()

Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String

Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range

Set Reg1 = New RegExp

ffile = FreeFile


sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
"Browse for file to be imported")

If sDTDFile = False Then Exit Sub '(user cancelled import file browser)


Open sDTDFile For Input Access Read As #ffile
Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
Close #ffile


Cells(1, 2) = "From DTD"
J = 2

For i = 0 To UBound(Lines)

'Debug.Print "Line"; i; "="; Lines(i)

With Reg1
'.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"


.Global = True
.MultiLine = True
.IgnoreCase = False
End With

If Reg1.Test(Lines(i)) Then
Set M1 = Reg1.Execute(Lines(i))
For Each M In M1
sExtract = M.SubMatches(1)

sExtract = Replace(sExtract, Chr(13), "")
Cells(J, 2) = sExtract
J = J + 1
'Debug.Print sExtract
Next M
End If
Next i

Set Reg1 = Nothing


End Sub


Currently, I'm matching on a set of data like this:



 


and extract Dealnumber but now, I need to add another match on data like this:







and extract just Dealparty without the Parens and the +



I've been using this as a reference and it's awesome but I'm still a bit confused. How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops



EDIT



I have come across a few new scenarios that have to be matched on.




 Extract Deal


Extract DealParty the ?,CR are throwing me off
LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >

Extract Deals



Answer



You could use this Regex pattern;



  .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"



  1. This portion




(#\w+|(\w+)\+)



says match either




#a-z0-9
a-z0-9+





inside the parentheses.



ie match either




(#PCDATA)
(DealParty+)




to validate the entire string





  1. Then the submatches are used to extract DealNumber for the first valid match, DealParty for the other valid match



edited code below - note submatch is now M.submatches(0)



    Sub ImportFromDTD()

Dim sDTDFile As Variant
Dim ffile As Long

Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range

Set Reg1 = New RegExp
J = 1


strIn = " "

With Reg1
.Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With

If Reg1.Test(strIn) Then

Set M1 = Reg1.Execute(strIn)
For Each M In M1
sExtract = M.SubMatches(2)
If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
sExtract = Replace(sExtract, Chr(13), "")
Cells(J, 2) = sExtract
J = J + 1
Next M
End If


Set Reg1 = Nothing

End Sub

No comments:

Post a Comment

casting - Why wasn&#39;t Tobey Maguire in The Amazing Spider-Man? - Movies &amp; TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...