Home > OS >  Extract a string from string, choosing the method
Extract a string from string, choosing the method

Time:07-15

I have a string which looks like this: [Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.

The challenge is to extract highlighted part of this string.

Tried some split operation but without any success.

CodePudding user response:

Use the -match operator to perform a regex search for the URL, then extract the matched string value:

# define input string
$errorString = '[Error] Failed to process site: https://some.domain.tld/teams/xxxxxx. The remote server returned an error: (404) Not Found.'

# define regex pattern for a URL followed by a literal dot
$urlFollowedByDotPattern = 'https?://(?:www\.)?[-a-zA-Z0-9@:%._\ ~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\ .~#?&//=]*)(?=\.)'

# perform match comparison
if($errorString -match $urlFollowedByDotPattern){
    # extract the substring containing the URL matched by the pattern
    $URL = $Matches[0]

    # remove everything before the last slash
    $pathSuffix = $URL -replace '^.*/'
}

If the URL was found, the $pathSuffix variable now contains the trailing xxxxxx part

CodePudding user response:

To offer a concise alternative to Mathias R. Jessen's helpful answer using the regex-based
-replace operator:

-replace can be special-cased for substring extraction by:

  • formulating a regex that matches the entire input string
  • using a capture group ((...)) inside that regex to capture the substring of interest and using that as the replacement string; e.g., $1 in the replacement string refers to what the first capture group captured.
$str = '[Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.'

$str -replace '. https://. /([^/] )\.. ', '$1' # -> 'xxxxxx'

For an explanation of the regex and the option to experiment with it, see this regex101.com page.

Note:

  • If the regex does not match, the whole input string is returned as-is.

  • If you'd rather return an empty string in that case, use a technique suggested by zett42: append |.* to the regex, which alternatively (|), if the original regex didn't match, unconditionally matches the whole input string with .*, in which case - since no capture group is then used - $1 evaluates to the empty string as the effective return value:

    # Note the addition of |.*
    'this has no URL' -replace '. https://. /([^/] )\.. |.*', '$1' # -> ''
    

If such a regex is too mind-bending, you can try a multi-step approach that combines -split and -like:

$str = '[Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.'

# -> 'xxxxxx'
# Note the line continuations (` at the very end of the lines)
# required to spread the command across multiple lines for readability.
(
  -split $str        <#  split into tokens by whitespace #> `
  -like 'https://*'  <# select the token that starts with 'https://' #> `
  -split '/'         <# split it into URL components by '/' #>
)[-1]?.TrimEnd('.')  <# select the last component and trim the traiing "." #>

Note:

  • The use ?., the null-conditional member-access operator, prevents a statement-terminating error if no token of interest is found ($null is returned instead), but it requires PowerShell (Core) 7.1 .

  • In earlier versions or alternatively, you can replace ?.TrimEnd('.') with -replace '\.$', which defaults to an empty string instead.

  • Related