I have a string which looks like this: [Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.
The challenge is to extract highlighted part of this string.
Tried some split operation but without any success.
CodePudding user response:
Use the -match
operator to perform a regex search for the URL, then extract the matched string value:
# define input string
$errorString = '[Error] Failed to process site: https://some.domain.tld/teams/xxxxxx. The remote server returned an error: (404) Not Found.'
# define regex pattern for a URL followed by a literal dot
$urlFollowedByDotPattern = 'https?://(?:www\.)?[-a-zA-Z0-9@:%._\ ~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\ .~#?&//=]*)(?=\.)'
# perform match comparison
if($errorString -match $urlFollowedByDotPattern){
# extract the substring containing the URL matched by the pattern
$URL = $Matches[0]
# remove everything before the last slash
$pathSuffix = $URL -replace '^.*/'
}
If the URL was found, the $pathSuffix
variable now contains the trailing xxxxxx
part
CodePudding user response:
To offer a concise alternative to Mathias R. Jessen's helpful answer using the regex-based -replace
operator:
-replace
can be special-cased for substring extraction by:
- formulating a regex that matches the entire input string
- using a capture group (
(...)
) inside that regex to capture the substring of interest and using that as the replacement string; e.g.,$1
in the replacement string refers to what the first capture group captured.
$str = '[Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.'
$str -replace '. https://. /([^/] )\.. ', '$1' # -> 'xxxxxx'
For an explanation of the regex and the option to experiment with it, see this regex101.com page.
Note:
If the regex does not match, the whole input string is returned as-is.
If you'd rather return an empty string in that case, use a technique suggested by zett42: append
|.*
to the regex, which alternatively (|
), if the original regex didn't match, unconditionally matches the whole input string with.*
, in which case - since no capture group is then used -$1
evaluates to the empty string as the effective return value:# Note the addition of |.* 'this has no URL' -replace '. https://. /([^/] )\.. |.*', '$1' # -> ''
If such a regex is too mind-bending, you can try a multi-step approach that combines -split
and -like
:
$str = '[Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.'
# -> 'xxxxxx'
# Note the line continuations (` at the very end of the lines)
# required to spread the command across multiple lines for readability.
(
-split $str <# split into tokens by whitespace #> `
-like 'https://*' <# select the token that starts with 'https://' #> `
-split '/' <# split it into URL components by '/' #>
)[-1]?.TrimEnd('.') <# select the last component and trim the traiing "." #>
Note:
The use
?.
, the null-conditional member-access operator, prevents a statement-terminating error if no token of interest is found ($null
is returned instead), but it requires PowerShell (Core) 7.1 .In earlier versions or alternatively, you can replace
?.TrimEnd('.')
with-replace '\.$'
, which defaults to an empty string instead.