For two C language source program listing, with the method of the hash table statistics of two programs in the use of C language keywords, and finally according to quantitative calculation results, it is concluded that two copies of the source program listing similarity,
[basic requirements]
C language keywords can build hash table, also can use "data structure and application algorithm tutorial" (YanWeiMin wen-bo Chen by tsinghua university press), 8, 10 in the book of the hash table, this topic main job is to scan the given source program, the accumulated in each source program of C language keywords in frequency, in the process of scanning source program, each keyword hash table lookup, and accumulate relevant keyword appears the frequency, in order to ensure the search efficiency, suggest that the average search length of hash table was self-built ASL is not greater than 2,
Scan two source program all the keyword statistics by different frequency, two vectors can be obtained, as shown in the following simple example:
Void Int For Char If Else while
4, 3, 4, 3, 7 0 2
4 5, 4, 5, 2 2 1
The keyword
Procedure 1 keyword frequency
Two keyword frequency program
Hash address 0 1 2 3 4 5 6 7 8 9
X1=,3,0,4,3,0,7,0,0,2 [4] X2=,2,0,5,4,0,5,2,0,1 [4]
Through the calculation of relative distance vector X1 and X2 to judge the similarity of the two source program, the calculation method of relative distance is
, T vector transposed,
According to the example that the given data, s 0.13, obviously when the X1 X2=s=0, reflect may be the same program; S value, the greater the difference may also has two programs, the greater the
[test data]
Do some compile and run the correct C program, the program there is similar and differences between large, s using the above method, and compared difference degree,
[implementation tips]
Subject will be the source for the large volume of the program scan, distinguish between every keyword, C program to build a key tree C language keywords, scanning source program, and to look for in the key tree synchronization carry on, in order to obtain each keyword,
[discussion]
This judgment method is to provide an auxiliary means, even s=0 May not be the same program, s value is very big, also may be exactly the same algorithm, for example, a program using the while statement, another use for statement, but the same function, in fact, when finding the value of s is very small, should be to distinguish between human intervention,
CodePudding user response:
By the way the copy?CodePudding user response:
This is my homework