Home > Back-end >  How do I analyze data from two different sources with a little different structers?
How do I analyze data from two different sources with a little different structers?

Time:04-14

So for example say there's a website that shows all the events available around my area in the next 2 weeks. But there's also another website that provides the same data labeled a little differently. Say the data from both websites was provided in json format and it looked like this:

"Events":{
  "id":1,
  "Name": "Rally",
  "Start time": "5pm"
}

and the second website also gives the data in json but instead of rally, the event is called rallies. Here's the json:

"Events":{
  "id":1,
  "Name": "Rallies",
  "Start time": "5pm"
}

It's obvious that these 2 events are the same thing but how do I map them together? What methods can I use to recognize them as the same thing? Imagine there were 1000 of these events. How would that affect the speed of the program?

CodePudding user response:

Try using Levenshtein distance to calculate the distance between 2 strings. If the distance is small, you can consider them to be the same, and different if the distance is too large. You may have to try different values before finalizing what the threshold should be to decide if the distance is too large or not.

  • Related