Recently in writing graduation thesis, the problem about the monolingual corpus building, time is short, hope to recognize your advice, thank you very much!!
CodePudding user response:
Baidu search: (1) VC file lookup or VC traversal, find all. XML file(2) the baidu search: VC, speaking, reading and writing XML file, reads the XML file, (not including label) of the content of the XML file written TXT
CodePudding user response:
This involves:1, folder traverse,
2, file IO,
Traverse if not considering stack overflow can directly use recursive way to traverse, if don't want a stack overflow, can use a container of ideas, each folder is the folder path in the container, after each traverse a folder is the folder path out from the container,
As to remove the label, you have to learn to from a string of identification tag, the contents of the identification tag, only will then tag content is written to TXT file,
CodePudding user response: