Python address than-CodePudding

A=[' convenient store cic international business shop ', 'Mr Fukuda street agency shenzhen futian district 1061 sweet mei road cic international business center layer block B, 12']
B=[' micro store oh (cic building stores) ', 'sweet mei road cic international business center in B1 building 1 layer']

C=[' wide Vingo wealth ', 'shenzhen futian shennan avenue HaoMing fortune plaza building group of 101-1 shops']
D=[' fortune plaza, 'shenzhen futian shennan avenue east HaoMing fortune plaza']

The first value in the list is the store name, the second value is the store address,
One can see that a and b, a, c and d is a, but how to use the code realization, Python have similar library

Please,

In this thank you

CodePudding user response:

Provides a train of thought, compared to a and b, c and d text similarity, if your demand is simpler, can write their own a comparative methods, accounts for the total number of words of the same word, over how many percentage is as the same, if your requirements for precision is higher, can use professional measuring algorithm, such as: edit distance, hamming distance, Euclidean distance, Jaccard similarity, cosine similarity, etc., specific you can baidu their difference and the concrete implementation algorithm, should have the ready-made,

CodePudding user response:

A=[' wahaha supermarket ', 'shenzhen futian district Shanghai road wahaha supermarket]
B=[' tissue supermarkets', 'shenzhen futian district Shanghai road tissue supermarkets']

But this situation is a and b is not a so good to grasp the degrees,
If fuzzy matching, the wahaha and tissue is divided into a, but not fuzzy Vingo wealth square and square is divided into two,

CodePudding user response:

This case, through the semantic analysis, check the NLP can have a more advanced algorithm, personal opinion, when obtain the data of phase do some processing, let the more easy to distinguish, if not, just think of some way to special circumstances special processing, finally you say a, b, I feel it is difficult to distinguish by computer

CodePudding user response:

These data from Excel read it out, then I have to do is to compare the name and address,
Low precision high, the also not line, so don't know how to make big head

CodePudding user response:

Use jieba participle and then into the collection
The intersection set

CodePudding user response:

reference 4 floor ruthlessly non-infected response:

these data from Excel read it out, then I have to do is to compare the name and address,
Low precision high, the also not line, so don't know how to make big

How many data? If not, you can first pick out some special, only + code manually to deal with

CodePudding user response:

The

TRHX reference 6 floor? Bob's reply:

Quote: refer to 4th floor ruthlessly non-infected response:
these data from Excel read it out, then I have to do is to compare the name and address,
Low precision high, the also not line, so don't know how to make big

How many data? If not, you can first pick out some special, only + code manually to deal with the

Tens of thousands of article

CodePudding user response:

reference 5 floor chuifengde reply:

with jieba participle and then into the collection
See a collection of intersection

Use jieba difficult oh

CodePudding user response:

Not very good oh, suggest that a natural language processing experts ask