Home > front end >  Quick way to extract the infomation from .xml files to the object
Quick way to extract the infomation from .xml files to the object

Time:04-08

I am starter and right now I am trying to extract the key information from a .xml file then load them to an object of my class, for example:

Here are some information in .xml file:

<row Id="17" Phone="12468" Address="Bos" />
<row Id="242" Phone="98324" Address="Chi" Age="30"/>
<row Id="157" Phone="23268" Age="25" />
<row Id="925" Phone="54325" Address="LA" />

And my class would be:

class worker{
    string ID;
    string Phone;
    string Address;
    string Age;
}

I know the infomation would be various and if there is not that infomation of that line, we put ""(empty string) in it as return. And I know the infomation are given in the same order of the fields in class. I try to implement a function, let says extractInfo(const string& line, const string &key)

//@line: the whole line read from .xml
//@key: it would be "Id:"", "Phone:"", "Address:"" or "Age:"", so that I could reach the
// previous index of the infomation that I could extract.

extractInfo(const string& line, const string &key){
   int index = line.find(key);
   if(index == -1) return "";

   int start = index   key.length();      //to reach the start quote
   int end = start;
   while(line[end] != '"'){               //to reach the end quote
       end  ;     
   }
   
   return line.substr(start, end - start);
}

 int main(){
    ...// for each line read from .xml, I build a new object of class worker and filling the field
    worker.Id = extraInfo(line, "Id:\"");
    worker.Phone = extraInfo(line, "Phone:\"");
    ...//etc.
    ...//then work on other manipulation
    return 0;
 }

My question are, is there any way that I could read and load the infomation from xml much more quickly through other APL or functions? That is, is there any way for me to improve this function when the .xml is a huge file with TBytes? And, is there any way that I can use less memory to, for example, find the oldest worker then print out? I know it's tough for me and I still try hard on it!

Thank all the ideas and advice in advance!

CodePudding user response:

I would go with the boost libraries to do this. They are very well maintained and optimized. Plus, you can compile them on your own so you can pick just what you need. Here is an example of how to read xml files with boost: https://www.boost.org/doc/libs/1_63_0/doc/html/property_tree/tutorial.html

CodePudding user response:

You can parse XML with existing XML parsing libraries, such as rapidxml, libxml2, etc.

Please note that for huge XML, since it need read all XML content to create the DOM tree, so the DOM method is not really suitable. you can use libxml2's xmlreader to parse each node one by one. libxml2 xml reader

static void
streamFile(const char *filename) {
    xmlTextReaderPtr reader;
    int ret;

    reader = xmlReaderForFile(filename, NULL, 0);
    if (reader != NULL) {
        ret = xmlTextReaderRead(reader);
        while (ret == 1) {
            const xmlChar *name = xmlTextReaderConstName(reader);
            if(xmlStrEqual(BAD_CAST "row", name)) {
                const xmlChar *id = xmlTextReaderGetAttribute(reader, "Id");
                const xmlChar *phone = xmlTextReaderGetAttribute(reader, "Phone");
                // you code here...
                xmlFree(id);
                xmlFree(phone);
            }
            ret = xmlTextReaderRead(reader);
        }
        xmlFreeTextReader(reader);
        if (ret != 0) {
            fprintf(stderr, "%s : failed to parse\n", filename);
        }
    } else {
        fprintf(stderr, "Unable to open %s\n", filename);
    }
}

And, If your XML format is always like above, you can also use std::regex_search to handle it https://en.cppreference.com/w/cpp/regex/regex_search

#include <iostream>
#include <string>
#include <regex>
 
int main()
{
    std::string str = R"(<row Id="17" Phone="12468" Address="Bos" />)";
 
    std::regex regex("(\\w )=\"(\\w )\"");
 
    // get all tokens
    std::smatch result;
    while (std::regex_search(str, result, regex))
    {
        std::cout << result[1] << ": " << result[2] << std::endl;
        str = result.suffix().str();
    }
}
  • Related