A=[' convenient store cic international business shop ', 'Mr Fukuda street agency shenzhen futian district 1061 sweet mei road cic international business center layer block B, 12']
B=[' micro store oh (cic building stores) ', 'sweet mei road cic international business center in B1 building 1 layer']
C=[' wide Vingo wealth ', 'shenzhen futian shennan avenue HaoMing fortune plaza building group of 101-1 shops']
D=[' fortune plaza, 'shenzhen futian shennan avenue east HaoMing fortune plaza']
The first value in the list is the store name, the second value is the store address,
One can see that a and b, a, c and d is a, but how to use the code realization, Python have similar library
Please,
In this thank you
CodePudding user response:
Provides a train of thought, compared to a and b, c and d text similarity, if your demand is simpler, can write their own a comparative methods, accounts for the total number of words of the same word, over how many percentage is as the same, if your requirements for precision is higher, can use professional measuring algorithm, such as: edit distance, hamming distance, Euclidean distance, Jaccard similarity, cosine similarity, etc., specific you can baidu their difference and the concrete implementation algorithm, should have the ready-made,CodePudding user response:
A=[' wahaha supermarket ', 'shenzhen futian district Shanghai road wahaha supermarket]B=[' tissue supermarkets', 'shenzhen futian district Shanghai road tissue supermarkets']
But this situation is a and b is not a so good to grasp the degrees,
If fuzzy matching, the wahaha and tissue is divided into a, but not fuzzy Vingo wealth square and square is divided into two,
CodePudding user response:
This case, through the semantic analysis, check the NLP can have a more advanced algorithm, personal opinion, when obtain the data of phase do some processing, let the more easy to distinguish, if not, just think of some way to special circumstances special processing, finally you say a, b, I feel it is difficult to distinguish by computerCodePudding user response:
These data from Excel read it out, then I have to do is to compare the name and address,Low precision high, the also not line, so don't know how to make big head
CodePudding user response:
Use jieba participle and then into the collectionThe intersection set
CodePudding user response: