To, the following data format with DELPHI how to parse the database (MDB) has the highest efficiency-CodePudding

Text data format as follows, characteristic value "name:" at the back of the data is not the same, ILINE only need to parse VRTX lines of data in the END, when the data is a common way can (read) in a row, but particularly large amount of data (same format) efficiency is very low, usually about 20 m to 100 m to a file, seek expert advice,

GOCAD PLine 1
The HEADER {
Name: bounding_pline_pm - 0920
Color:
0.8 0.8 0.1 1.0Both atoms: off
Cn: off
Line_visible: on
Name_in_model_list: bounding_pline_pm - 0920
}
GEOLOGICAL_TYPE boundary
ILINE
VRTX 1 3274.65185547 3388.31103516 600
VRTX 2 2780.47558594 2596.45825195 600
VRTX 3 2367.03491211 2222.10327148 600
VRTX 4 2253.71923828 2140.22363281 600
VRTX 5 2106.87524414 1845.48815918 600
VRTX 6 2309.38818359 1265.0916748 600
VRTX 7 2309.38818359 1265.0916748 0
VRTX 8 2106.87524414 1845.48815918 0
VRTX 9 2253.71923828 2140.22363281 0
VRTX 10 2367.03491211 2222.10327148 0
VRTX 11, 2780.47558594 2596.45825195 0
VRTX 12 3274.65185547 3388.31103516 0
SEG 1 2
SEG 2, 3,
SEG, 3, 4,
SEG, 4, 5
SEG 5 6
SEG 6 7
SEG 7 8
SEG 8 and 9
SEG 9 10
SEG 10 11
SEG 11 12
SEG 12 1
END

CodePudding user response:

I don't think the efficiency bottleneck in parsing text files, because for a text file I/O is soon, if the whole program execution think about warehousing slow slow? Such as setting up once every 1000 submitted, instead of a submission

CodePudding user response:

Tested not warehousing also slowly, data parsing takes time too

CodePudding user response:

Text file since the backward sequence before reading, reading and judgment, so better, per line read, actually read at least two times the data

CodePudding user response:

Can use multithreading text data? I haven't done tried it on

CodePudding user response:

reference 4 floor lusj586 response:

can use multithreading text data? I tried not fix

Of course, multithreading and not much complicated

In addition to deal with data from 100 m, how much is your ideal efficiency? The bottom line?

CodePudding user response:

As the results of your analysis
Repeated computation speed is mainly due to analytical method is reasonable, and unnecessary judgment can increase the computing workload,

CodePudding user response:

A custom format processing to MDB database again can

CodePudding user response:

Of course, multithreading and not much complicated

In addition to deal with data from 100 m, how much is your ideal efficiency? The bottom line?

I also feel a multithreaded should be able to solve this problem, just know too little for multithreaded, don't have much experience, I do not know can you provide ideas?
100 m data dealt with in I hope I can in three minutes, now algorithm is as high as more than five minutes

CodePudding user response:

refer to 6th floor YFLK response:

the result of the as youRepeated computation speed is mainly due to analytical method is reasonable, and unnecessary judgment can increase the computing workload, and

My train of thought is according to the characteristic value to search data location, and then you read read judgment, only to END the END tag, search again another characteristic value repeated parsing, think no double counting, but in a text file search eigenvalue is sequential search, I don't know any better idea,

CodePudding user response:

refer to 7th floor sgzhou12345 response:

a custom format processing to MDB database again can

You said this a custom format is to point to?

CodePudding user response:

Say two sentences: 100 m TXT, MDB format? If often do estimate system can't stand? If only one or two times, slowly have what relation?

In addition, the feeling can generate a standard format Txt, use a command is import (Txt quite a database table),

Finally, a one-time large amounts of data insertion, has never been database design goal!

CodePudding user response:

11 references liups response:

say two words: 100 m TXT, MDB format? If often do estimate system can't stand? If only one or two times, slowly have what relation?

In addition, the feeling can generate a standard format Txt, use a command is import (Txt quite a database table),

Finally, a one-time large amounts of data insertion, has never been database design goal!

TXT format data is GOCAD export exchange format files and other programs to use this file, generated MDB because oneself write programs using the database, also involves a lot of other calculation after storage

CodePudding user response:

The original poster is too stubborn,
Do one thing: before oracle data export, direct guide because there is something wrong with the original data, the results using the query tool interface to save as TXT method, get the TXT file for hundreds of M, and then use a VFP (the speed is not fast, but the machine must not now go!) Small program structured TXT in order to import, reads the source TXT, at the same time the output standard format TXT, then import database, the result is very fast!

CodePudding user response:

reference 13 floor liups reply:

the original poster is too stubborn,
Do one thing: before oracle data export, direct guide because there is something wrong with the original data, the results using the query tool interface to save as TXT method, get the TXT file for hundreds of M, and then use a VFP (the speed is not fast, but the machine must not now go!) Small program structured TXT in order to import, reads the source TXT, at the same time the output standard format TXT, then import database, the result is very fast!

Ha ha, is not my stubborn, may be I am not clear, the original structure of the data format is as follows, GOCAD PLINE1 beginning, END, at the END of name: value is different, like this structure in a TXT dozens or even hundreds of, but I need to do is to retrieve different name: value, and then parse the data segment VRTX under the back of the data in the database, data format is fixed at the back of the VRTX good interpretation, but there are dozens of hundreds of lines of the data
GOCAD PLine 1
The HEADER {
Name: bounding_pline_pm - 0920
Color:
0.8 0.8 0.1 1.0Both atoms: off
Cn: off
Line_visible: on
Name_in_model_list: bounding_pline_pm - 0920
}
GEOLOGICAL_TYPE boundary
ILINE
VRTX 1 3274.65185547 3388.31103516 600
.
VRTX 12 3274.65185547 3388.31103516 0
SEG 1 2
.
SEG 12 1
END

CodePudding user response:

Don't know with regular fast parsing

CodePudding user response:

references 9 f lusj586 response:

Quote: refer to the sixth floor YFLK response:

As the results of your analysis
Repeated computation speed is mainly due to analytical method is reasonable, and unnecessary judgment can increase the computing workload, and

My train of thought is according to the characteristic value to search data location, and then you read read judgment, only to END the END tag, search again another characteristic value repeated parsing, think no double counting, but in a text file search eigenvalue is sequential search, I don't know any better idea,

I think your approach is right,
According to the characteristic value to search data location, and then read a word a word to read judgment analytical... [like lexer work]
There is no other better way unless you know how many bytes will not appear after END eigenvalues