How Can I Parse String and Get Random Sentences in C#?-CodePudding

I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth. and after that make random sentences.

"{{Hello,Hi,Hey} {world,earth},{Goodbye,farewell} {planet,rock,globe{.,!}}}"

where

, means  or
{ means expand
} means collapse up to parent

for example, i want to get output like this:

1) hello world planet.
2) hi earth globe!
3) goodby planet.
and etc.

CodePudding user response：

I think that can be a complicated job, for that I used this tutorial, I strongly advice you to read the entire page to understand how this works.

First, you have to pass this "tree" as an array. You can parse the string, manually set the array or whatever. That's important because there isn't a good model for that tree model so it's better if you use a already available one. Also, it's important that if you want to set a correct grammar, you'll need to add "weight" to those words and tell the code how to correctly set and in what order.

Here is the code snippet:

using System;
using System.Text;

namespace App
{
    class Program
    {
        static void Main(string[] args)
        {
            string tree = "{{Hello,Hi,Hey} {world,earth},{Goodbye,farewell} {planet,rock,globe{.,!}}}";
            string[] words = { "Hello", "Hi", "Hey", "world", "earth", "Goodbye", "farewell", "planet", "rock", "globe" };
            RandomText text = new RandomText(words);
            text.AddContentParagraphs(12, 1, 3, 3, 3);
            string content = text.Content;
            Console.WriteLine(content);
        }
    }

    public class RandomText
    {
        static Random _random = new Random();
        StringBuilder _builder;
        string[] _words;

        public RandomText(string[] words)
        {
            _builder = new StringBuilder();
            _words = words;
        }

        public void AddContentParagraphs(int numberParagraphs, int minSentences,
        int maxSentences, int minWords, int maxWords)
        {
            for (int i = 0; i < numberParagraphs; i  )
            {
                AddParagraph(_random.Next(minSentences, maxSentences   1),
                     minWords, maxWords);
                _builder.Append("\n\n");
            }
        }

        void AddParagraph(int numberSentences, int minWords, int maxWords)
        {
            for (int i = 0; i < numberSentences; i  )
            {
                int count = _random.Next(minWords, maxWords   1);
                AddSentence(count);
            }
        }

        void AddSentence(int numberWords)
        {
            StringBuilder b = new StringBuilder();
            // Add n words together.
            for (int i = 0; i < numberWords; i  ) // Number of words
            {
                b.Append(_words[_random.Next(_words.Length)]).Append(" ");
            }
            string sentence = b.ToString().Trim()   ". ";
            // Uppercase sentence
            sentence = char.ToUpper(sentence[0])   sentence.Substring(1);
            // Add this sentence to the class
            _builder.Append(sentence);
        }

        public string Content
        {
            get
            {
                return _builder.ToString();
            }
        }
    }
}

CodePudding user response：

The input string must be parsed. Since it can contain nested braces, we need a recursive parser. But to begin with, we need a data model to represent the tree structure.

We can have three different types of items in this tree: text, a list representing a sequence and a list representing a choice. Let's derive three classes from this abstract base class:

abstract public class TreeItem
{
    public abstract string GetRandomSentence();
}

The TextItem class simply returns its text as "random sentence":

public class TextItem : TreeItem
{
    public TextItem(string text)
    {
        Text = text;
    }

    public string Text { get; }

    public override string GetRandomSentence()
    {
        return Text;
    }
}

The sequence concatenates the text of its items:

public class SequenceItem : TreeItem
{
    public SequenceItem(List<TreeItem> items)
    {
        Items = items;
    }

    public List<TreeItem> Items { get; }

    public override string GetRandomSentence()
    {
        var sb = new StringBuilder();
        foreach (var item in Items) {
            sb.Append(item.GetRandomSentence());
        }
        return sb.ToString();
    }
}

The choice item is the only one using randomness to pick one random item from the list:

public class ChoiceItem : TreeItem
{
    private static readonly Random _random = new();

    public ChoiceItem(List<TreeItem> items)
    {
        Items = items;
    }

    public List<TreeItem> Items { get; }

    public override string GetRandomSentence()
    {
        int index = _random.Next(Items.Count);
        return Items[index].GetRandomSentence();
    }
}

Note that the sequence and choice items both call GetRandomSentence() recursively on their items to descend the tree recursively.

This was the easy part. Now lets create a parser.

public class Parser
{
    enum Token { Text, LeftBrace, RightBrace, Comma, EndOfString }

    int _index;
    string _definition;
    Token _token;
    string _text; // If token is Token.Text;

    public TreeItem Parse(string definition)
    {
        _index = 0;
        _definition = definition;
        GetToken();
        return Choice();
    }

    private void GetToken()
    {
        if (_index >= _definition.Length) {
            _token = Token.EndOfString;
            return;
        }
        switch (_definition[_index]) {
            case '{':
                _index  ;
                _token = Token.LeftBrace;
                return;
            case '}':
                _index  ;
                _token = Token.RightBrace;
                return;
            case ',':
                _index  ;
                _token = Token.Comma;
                return;
            default:
                int startIndex = _index;
                do {
                    _index  ;
                } while (_index < _definition.Length & !"{},".Contains(_definition[_index]));
                _text = _definition[startIndex.._index];
                _token = Token.Text;
                return;
        }
    }

    private TreeItem Choice()
    {
        var items = new List<TreeItem>();
        while (_token != Token.EndOfString && _token != Token.RightBrace) {
            items.Add(Sequence());
            if (_token == Token.Comma) {
                GetToken();
            }
        }
        if (items.Count == 0) {
            return new TextItem("");
        }
        if (items.Count == 1) {
            return items[0];
        }
        return new ChoiceItem(items);
    }

    private TreeItem Sequence()
    {
        var items = new List<TreeItem>();
        while (true) {
            if (_token == Token.Text) {
                items.Add(new TextItem(_text));
                GetToken();
            } else if (_token == Token.LeftBrace) {
                GetToken();
                items.Add(Choice());
                if (_token == Token.RightBrace) {
                    GetToken();
                }
            } else {
                break;
            }
        }
        if (items.Count == 0) {
            return new TextItem("");
        }
        if (items.Count == 1) {
            return items[0];
        }
        return new SequenceItem(items);
    }
}

It consists of a lexer, i.e., a low level mechanism to split the input text into tokens. We have have four kinds of tokens: text, "{", "}" and ",". We represent these tokens as

enum Token { Text, LeftBrace, RightBrace, Comma, EndOfString }

We also have added a EndOfString token to tell the parser that the end of the input string was reached. When the token is Text we store this text in the field _text. The lexer is implemented by the GetToken() method which has no return value and instead sets the _token field, to make the current token available in the different parsing methods.