I tried doing this
using System;
using System.Collections.Generic;
using System.Text;
namespace UrlsDetector
{
class UrlDetector
{
public static string RemoveUrl(string input)
{
var words = input;
while(words.Contains("https://"))
{
string urlToRemove = words.Substring("https://", @" ");
words = words.Replace("https://" urlToRemove , @"");
}
}
}
class Program
{
static void Main()
{
Console.WriteLine(UrlDetector.RemoveUrl(
"I saw a cat and a horse on https://www.youtube.com/"));
}
}
}
but it doesn't work
what I want to achieve is remove the entire "https://www.youtube.com/" and display "I saw a cat and a horse on"
I also want to display a message like "the sentence you input doesn't have url" if the sentence doesn't have any url. but as you can I didnt put any code to do that I just need to fix this code first but if you want to help me do that too, I gladly appreciated it.
thanks for responses.
CodePudding user response:
If you are looking for a non RegEx way to do this, here you go. But the method I encoded below assumes that a URL begins with "http://"
or "https://"
, which means it will not work with URL's that begin with something like ftp://
or file://
, although the code below can be easily modified to support that. Also, it assumes the URL path continues until it reaches either the end of the string or a white space character (like a space or a tab or a new line). Again, this can easily be modified if your requirements are different.
Also, if the string contains no URL, currently it just returns a blank string. You can modify this easily too!
using System;
public class Program
{
public static void Main()
{
string str = "I saw a cat and a horse on https://www.youtube.com/";
UrlExtraction extraction = RemoveUrl(str);
Console.WriteLine("Original Text: " extraction.OriginalText);
Console.WriteLine();
Console.WriteLine("Url: " extraction.ExtractedUrl);
Console.WriteLine("Text: " extraction.TextWithoutUrl);
}
private static UrlExtraction RemoveUrl(string str)
{
if (String.IsNullOrWhiteSpace(str))
{
return new UrlExtraction("", "", "");
}
int startIndex = str.IndexOf("https://",
StringComparison.InvariantCultureIgnoreCase);
if (startIndex == -1)
{
startIndex = str.IndexOf("http://",
StringComparison.InvariantCultureIgnoreCase);
}
if (startIndex == -1)
{
return new UrlExtraction(str, "", "");
}
int endIndex = startIndex;
while (endIndex < str.Length && !IsWhiteSpace(str[endIndex]))
{
endIndex ;
}
return new UrlExtraction(str, str.Substring(startIndex, endIndex - startIndex),
str.Remove(startIndex, endIndex - startIndex));
}
private static bool IsWhiteSpace(char c)
{
return
c == '\n' ||
c == '\r' ||
c == ' ' ||
c == '\t';
}
private class UrlExtraction
{
public string ExtractedUrl {get; set;}
public string TextWithoutUrl {get; set;}
public string OriginalText {get; set;}
public UrlExtraction(string originalText, string extractedUrl,
string textWithoutUrl)
{
OriginalText = originalText;
ExtractedUrl = extractedUrl;
TextWithoutUrl = textWithoutUrl;
}
}
}
CodePudding user response:
A simplified version of what you're doing. Instead of using SubString
or IndexOf
, I split the input into a list of strings, and remove the items that contain a URL. I iterate over the list in reverse as removing an item in a forward loop direction will skip an index.
public static string RemoveUrl(string input)
{
List<string> words = input.Split(" ").ToList();
for (int i = words.Count - 1; i >= 0; i--)
{
if (words[i].StartsWith("https://")) words.RemoveAt(i);
}
return string.Join(" ", words);
}
This methods advantage is avoiding SubString
and Replace
methods that essentially create new Strings each time they're used. In a loop this excessive string manipulation can put pressure on the Garbage Collector and bloat the Managed Heap. A Split
and Join
has less performance cost in comparison especially when used in a loop like this with a lot of data.
CodePudding user response:
Better way to use, split and StringBuilder
. Code will be look like this. StringBuilder
is optimized this kind of situation.
Pseudocode:
var words = "I saw a cat and a horse on https://www.youtube.com/".Split(" ").ToList();
var sb = new StringBuilder();
foreach(var word in words){
if(!word.StartsWith("https://")) sb.Append(word " ");
}
return sb.ToString();
CodePudding user response:
Using basic string manipulation will never get you where you want to be.
Using regular expressions makes this very easy for you.
search for a piece of text that looks like
"http(s)?:\/\/\S*[^\s\.]"
:
http
: the text blockhttp
(s)?
: the optional (?
) letters
:\/\/
: the characters://
\S*
: any amount (*
) non white characters (\S
)[^\s\.]
: any character that is not (^
) in the list ([
]
) of characters being white characters (\s
) or dot (\.
). This allows you to exclude the dot at the end of a sentence from your url.
using System;
using System.Text.RegularExpressions;
namespace UrlsDetector
{
internal class Program
{
static void Main(string[] args)
{
Console.WriteLine(UrlDetector.RemoveUrl(
"I saw a cat and a horse on https://www.youtube.com/ and also on http://www.example.com."));
Console.ReadLine();
}
}
class UrlDetector
{
public static string RemoveUrl(string input)
{
var regex = new Regex($@"http(s)?:\/\/\S*[^\s.]");
return regex.Replace(input, "");
}
}
}
Using regular expressions you can also detect matches Regex.Match(...)
which allows you to detect any urls in your text.