Home > Software engineering >  Public substrings of two characters
Public substrings of two characters

Time:10-06

There are about 4 million records, record every string length is 40 digital characters, has now asked if two public substring of string length range of more than 10, the output of this two strings, but is it so hard to find a half day a
Stupid, my way is to put all the characters in an array, and then processing, and can have a better way

CodePudding user response:

That sounds like the combination of two exhaustive comparison, can't think out what else the efficient way to 4 million bytes, 40 amounted to less than 200 m, read into the memory should not have what problem,

CodePudding user response:

Google suffix array dynamic programming LCS
Thinking is all the strings in the suffix array, and then sorted
VB does not support pointer, to implement the efficiency is very low,

CodePudding user response:

This test is that better?
 
The Public Function CompStr (Str1 As String, Str2 As String, Optional ByVal Length As Integer=10) As a Boolean
'Str1 Str2 need to compare two strings
'Lenght matching measures, longer than the output of the Lenght true, otherwise false
Dim I As an Integer, subStr As String
For I=1 To Len (Str1) - Length
SubStr=Mid (Str1, I, Length + 1)
If InStr Str2, subStr () & gt; 0 Then the Exit For
Next
CompStr=CBool (I & lt; Len (Str1) - Length)
End the Function

CodePudding user response:

"All the characters into an array" this have what use?
If it is all "string", that is should,
But it must be line memory is not big enough,
Single is 40 character data, this article 400 w
Have more than 320 MB of space,
If the free memory is not enough, you will be transferred to the exchange area,
Your "data load" process could be very long,

"Half a day to find a", maybe your search method is not right,
You'd better use some instance data,
To explain what "meet", what kind of do not conform to,

CodePudding user response:

reference 1st floor bcrun response:
that sounds like the combination of two exhaustive comparison, can't think out what else the efficient way to 4 million bytes, 40 amounted to less than 200 m, read into the memory should not have what problem,

40 characters, accounts for 80 - byte + 4 bytes,
Article 4 million, it is about 336 m,
400 Wan Yuansu string array, a total of about 16 m,
This rough a calculate, is about 350 m,

CodePudding user response:

Since more than 10 public string length, that put apart each string N 11 of the length of the string,

You're right,

CodePudding user response:

The
reference 4 floor Chen8013 reply:
"all the characters into an array" this have what use?
If it is all "string", that is should,
But it must be line memory is not big enough,
Single is 40 character data, this article 400 w
Have more than 320 MB of space,
If the free memory is not enough, you will be transferred to the exchange area,
Your "data load" process could be very long,

"Half a day to find a", maybe your search method is not right,
You'd better use some instance data,
To explain what "meet", what kind of do not conform to the,

Computer use is 4 gb of memory, the whole record set can be loaded into the array, the time is about ten minutes, but at run time, the stack overflow at line 0
With small amounts of data tested, there is nothing wrong with the program logic, is speed problem, with 250 records spent about 10 seconds to find qualified, as for the 2 w records, for such a half-day program didn't stop, shut down directly, afford to wait!

CodePudding user response:

not to give the data code for analysis are all play rascal!

CodePudding user response:

refer to 7th floor u010574425 response:
Quote: refer to 4th floor Chen8013 response:

"All the characters into an array" this have what use?
If it is all "string", that is should,
But it must be line memory is not big enough,
Single is 40 character data, this article 400 w
Have more than 320 MB of space,
If the free memory is not enough, you will be transferred to the exchange area,
Your "data load" process could be very long,

"Half a day to find a", maybe your search method is not right,
You'd better use some instance data,
To explain what "meet", what kind of do not conform to the,

Computer use is 4 gb of memory, the whole record set can be loaded into the array, the time is about ten minutes, but at run time, the stack overflow at line 0
With small amounts of data tested, there is nothing wrong with the program logic, is speed problem, with 250 records spent about 10 seconds to find qualified, as for the 2 w records, for such a half-day program didn't stop, shut down directly, afford to wait!

"The time is about 10 minutes"?
Should be "10 seconds!" You so of 400 w, is 160 m,
"There is nothing wrong with the program logic", does not represent "efficiency is good also,"

CodePudding user response:

To create a database,

40 bytes of each record in the table 1 into new 31 note records in the table 2, a field is the substring (length of 10 bytes), another field is recorded in the original table ID,

First lookup table 2 group accumulative total value is greater than 1 record set, then the query table 2 in the previous query string field in a recordset, of course, can be in the previous step as a subquery, one time,

At this point, you check the record of ID,

CodePudding user response:

The 4 million records, according to such "query" come down,
Are you going to spend a few hours to "waiting for the complete"?
-_ -!!!!!!

CodePudding user response:

11 references Chen8013 response:
4 million records, according to such "query" come down,
Are you going to spend a few hours to "waiting for the complete"?
-_ -!!!!!!



The
references to the tenth floor of123 response:
to create a database,

40 bytes of each record in the table 1 into new 31 note records in the table 2, a field is the substring (length of 10 bytes), another field is recorded in the original table ID,

First lookup table 2 group accumulative total value is greater than 1 record set, then the query table 2 in the previous query string field in a recordset, of course, can be in the previous step as a subquery, one time,

At this point, you check the record of ID,

Understand what you mean, can I have the access to the records of 400 w, each into 31 sliver series, which is about 120 million records, grouping probably how long will it take?
Want to use the SQL, but don't understand.
  • Related