Home > Net >  How to strip links with html contents to plain text in c#
How to strip links with html contents to plain text in c#

Time:04-27

Requirement is:

"link: <http://www.google.com|www.google.com> link1: <http://www.jira.com|www.jira.com>\n\n\n"

Need to display it as:

"link: www.google.com link1: www.jira.com"

Any solution for this.

CodePudding user response:

You could try a Regex:

Regex.Replace(
    input: "link: <http://www.google.com|www.google.com> link1: <http://www.jira.com|www.jira.com>\n\n\n",
    pattern: @"<[^|]*\|([^>]*)>",
    replacement: "$1")

Output:

link: www.google.com link1: www.jira.com

Working example: https://dotnetfiddle.net/fn99jz

Regex breakdown:

<           // Match a literal '<'.
[^|]*       // Match all characters until reaching a '|'.
\|          // Match a literal '|' (needs escaping).
(           // Start capturing all characters that
            //   match the following expression.
  [^>]*     // Match all characters until reaching a '>'. 
)           // Stop capturing and store the previous match
            //   in the refernce '$1'.
>           // Match a literal '>'.
  • Related