Oracle Outside In Parse SCCCA content-CodePudding

I am using document Oracle Outside In to output text content of pdf document.

I am using below parameters to pass to main function of CASample.c file from content access of https://www.oracle.com/middleware/technologies/outside-in-technology-downloads.html#

C:\adobe-acrobat.pdf -u C:\adobe-acrobat.txt";

Which gives me text in below format.

SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 8, Character Set = 0x00030100.
    Outside 
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 3, Character Set = 0x00030100.
    In 
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 8, Character Set = 0x00030100.
    Unlocks 
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 9, Character Set = 0x00030100.
    Business 
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 10, Character Set = 0x00030100.
    Documents 
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 4, Character Set = 0x00030100.
    for 
SCCCA_TEXT: dwSubType = 0x08020002, Number of Characters = 1, Character Set = 0x00030100.

So how do I only get text out of it without metadata? like instead of above entire metadata content I only need Outside In Unlocks Business Documents for or do I have to make my own parser to get those data?

CodePudding user response：

There is a tademo.vcxproj as well in their downloaded files which does the job to extract text. It is a desktop application that you can convert to a library.

https://www.oracle.com/middleware/technologies/outside-in-technology-downloads.html#

After converting it to a library, I created the following function in tademo.c file which will take the input file and export the text file as output.

int callableMain(char* inputPath, char* outputPath) {
    strncpy(szInputPath,inputPath, PATHSIZE);
    DAInitEx(SCCOPT_INIT_NOTHREADS, OI_INIT_DEFAULT);
    DoTextClose();
    dwBlockNum = 0;
    DoTextOpen(1);
    DoSaveTextAs(outputPath);
    DoTextClose();
    return 1;
}