Home > Net >  C# Remove duplicate lines in the contents of StringBuilder type
C# Remove duplicate lines in the contents of StringBuilder type

Time:11-14

I am trying to create a very large wordlist with each word in a separate line. I generate the words using some logic and storing them using StringBuilder. It appears in tests that I create some duplicated words e.g.

!AngryDogAngry1916!
@AngryAngryDog1916!
:AngryDog1916!
!AngryDogAngry1916!
...

In the example the generated first and fourth lines and I would like to remove one of them. How to remove the duplicate line(s) from the StringBuilder variable? Line-wise consideration is necessary, otherwise the words themselves would be manipulated e.g. modification of the word !AngryDogAngry1916! to !AngryDog1916! should NOT happen. Thanks.

I could not find a method to access the content in a StringBuilder line-wise. I don't know where to start and do not to want to change the StringBuilder type.

CodePudding user response:

As one suggested in a comment, use HashSet<T> to store your words. This collection will help you detect duplicates using a complexity of O(1), which you won't be able to do with a StringBuilder instance.

StringBuilder is great for generating text on the fly, but it is not designed for efficient searching.

Since you mentioned you are worried about writing the output words in a fast manner while preserving the current implementation, you can still use an instance of HashSet<T> side-by-side with your StringBuilder for detecting duplicates. Thus, before appending your new word into your StringBuilder instance, you would check for duplicates using the hash set.


Following our discussion in the comments, if you have to deal with the fully filled StringBuilder instance, iterate over the resulting characters of the long string of words.

For each word (you recognize them when they end by the break line symbol(s) (Environment.NewLine)), add them into a HashSet<string> instance to detect duplicate entries.

When a duplicate is found, call the StringBuilder.Remove method to remove the word from the initial StringBuilder.

CodePudding user response:

This is how you can do it :

StringBuilder strbuilder = new StringBuilder("!AngryDogAngry1916!\n@AngryAngryDog1916!\n:AngryDog1916!\n!AngryDogAngry1916!");
String[] splitstrings = strbuilder.ToString().Split('\n');
splitstrings = splitstrings.Distinct < String > ().ToArray();
string result = string.Join("\n", splitstrings);
strbuilder.Clear();
strbuilder.Append(result);
Console.WriteLine(result);

Result:

!AngryDogAngry1916!
@AngryAngryDog1916!
:AngryDog1916!

othe way you must to call the function that removes repetition to the generation words method and then you can copy the results to the StringBuilder,adding your code to help you more.

  • Related