Home > Net >  Extracting text from RTF with text and image
Extracting text from RTF with text and image

Time:12-20

I have a bytearray extracted from a WPF RichTextControl for which I extract text from. I use following code successfully:

FlowDocument document = new FlowDocument();
TextRange txtRange = null;
using (MemoryStream stream = new MemoryStream(data))
{
    txtRange = new TextRange(document.ContentStart, document.ContentEnd);
    txtRange.Load(stream, DataFormats.XamlPackage);
}

The problem starts when there is an image embedded in the rtf. I would still like to extract the text but the code above will fail with XamlParseException on the Load method.

I tried using following method:

using (RichTextBox rtb = new RichTextbox())
{
  rtb.Rtf = System.Text.Encoding.Default.GetString(data);
  // use rtb.Text
}

but the setting of rtb.Rtf fails with ArgumentException. Reason is probably explained here since the GetString indeed does not return the expected rtf format but mixed text/binary data with mentions of xaml (same format also returns for text only, which was successfully extracted with previous method). I cannot upgrade framework.

I don't mind traversing the FlowDocument tree if needed to extract text if I can find a way to load the document successfully.

Is there an additional way to read the RTF?

CodePudding user response:

Apperantly when an image is included in the RTF, the code will work when running in STA. e.g.:

Thread t = new Thread(() => Foo(data));
t.SetApartmentState(Apartment.STA);
t.Start();
t.Join();

Foo()
{
  FlowDocument document = new FlowDocument();
  TextRange txtRange = null;
  using (MemoryStream stream = new MemoryStream(data))
  {
      txtRange = new TextRange(document.ContentStart, document.ContentEnd);
      txtRange.Load(stream, DataFormats.XamlPackage);
  }
}
  • Related