Home > Back-end >  I would like to use regex to retrieve text between two words. This text has xml tags but isnt xml
I would like to use regex to retrieve text between two words. This text has xml tags but isnt xml

Time:02-23

For example I have a bunch of text that is upparsed from a command that I am looping through and would like to get the text between. I've tried (.*?) \([</Location>])$ and nothing happened. Not a single thing. SO in this body of text, for example I need the paths inside the <Location>

<?xml version="1.0" encoding="utf-16"?><AppMgmtDigest xmlns="http://schemas.microsoft.com/SystemCenterConfigurationManager/2009/AppMgmtDigest" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Application AuthoringScopeId="ScopeId_844389FD-D138-4D2A-BF1E-BFEAB11391B5" LogicalName="Application_0487d42d-94f8-4424-bd10-693005c74d9c" Version="11"><DisplayInfo DefaultLanguage="en-US"><Info Language="en-US"><Title>Update BeyondTrust</Title><ReleaseDate>2022-01-14</ReleaseDate></Info></DisplayInfo><DeploymentTypes><DeploymentType AuthoringScopeId="ScopeId_844389FD-D138-4D2A-BF1E-BFEAB11391B5" LogicalName="DeploymentType_3f86c80a-f4d6-4c63-b066-7c030730456a" Version="11"/></DeploymentTypes><Title ResourceId="Res_163096156">Update BeyondTrust</Title><ReleaseDate ResourceId="Res_2088816488">2022-01-14</ReleaseDate><Owners><User Qualifier="LogonName" Id="Admin.MH"/></Owners><Contacts><User Qualifier="LogonName" Id="Admin.MH"/></Contacts></Application><DeploymentType AuthoringScopeId="ScopeId_844389FD-D138-4D2A-BF1E-BFEAB11391B5" LogicalName="DeploymentType_3f86c80a-f4d6-4c63-b066-7c030730456a" Version="11"><Title ResourceId="Res_1162077075">Update BeyondTrust</Title><DeploymentTechnology>GLOBAL/ScriptDeploymentTechnology</DeploymentTechnology><Technology>Script</Technology><Hosting>Native</Hosting><Installer Technology="Script"><ExecutionContext>System</ExecutionContext><Contents><Content ContentId="Content_27d453bb-3439-4440-a90b-ddd731e5a4a7" Version="1"><File Name="PrivilegeManagementConsoleAdapter_x64.msi" Size="7425536"/><File Name="PrivilegeManagementForWindows_x64.msi" Size="21287936"/><File Name="remediate.ps1" Size="3020"/><Location>\\pennoni.com\util\Software\BeyondTrust\PMCloud\application_sccm\</Location><PeerCache>true</PeerCache><OnFastNetwork>Download</OnFastNetwork><OnSlowNetwork>DoNothing</OnSlowNetwork></Content></Contents><DetectAction><Provider>Local</Provider><Args><Arg Name="ExecutionContext" Type="String">System</Arg><Arg Name="MethodBody" Type="String">&lt;?xml version="1.0" encoding="utf-16"?&gt;

Basically, in a body of text, I want to retrieve the text between

<Location> pathThatINeed </Location>

CodePudding user response:

here is a solution that uses regex ONLY for the -split operator. this presumes your sample line of text is stored in $Test. [grin]

the code ...

(($Test -split '<location>')[1] -split '</location>')[0]

the result ...

\\pennoni.com\util\Software\BeyondTrust\PMCloud\application_sccm\

CodePudding user response:

Lee Dailey's helpful answer offers a pragmatic solution that is easy to conceptualize.

To offer a single-operation alternative using the regex-based -replace operator:

# $text is assumed to contain the (incomplete) input XML text.
$text -replace '^.*<location>(. ?)</location>.*$', '$1'

For an explanation of the regex and the ability to experiment with it, see this regex101.com page.

CodePudding user response:

That should do the trick:

(?<=<Location>).*?(?=<Location/>)

Output:

 THisismyDesiredText 

Explanation:

  • (?<=): Positive Lookbehind
  • .*?: Matches any character between zero and unlimited times, as few times as possible (lazy)
  • (?=): Positive Lookahead
  • Related